What is LoRA and How does LoRA work

What is LoRA and How does LoRA work?

LoRA stands for Low-Rank Adaptation of Large Language Models (LLMs). It is a novel technique proposed by Microsoft Research with a focus on addressing the challenge of Optimizing and Fine-Tuning Large Language Models. These models, such as GPT-3, with billions of parameters and GPT-4 with over a trillion parameters, are prohibitively expensive to adapt to specific tasks or domains.

LoRA models are small models that apply tiny changes to stable checkpoint models. They are usually 10 to 100 times smaller than checkpoint models. That makes them very attractive to people having an extensive collection of models. LoRA’s effective adaptation technique maintains model quality while significantly reducing the number of trainable parameters for downstream tasks with no increased inference time.

LoRA can be used for general-purpose fine-tuning, which allows it to be customized for different domains or datasets. Even though LoRA was initially proposed for Large-Language Models and demonstrated on Transformer blocks, the technique can also be applied elsewhere.

LoRA applies small changes to the most critical part of models: The cross-attention layers. It is the part of the model where the image and the prompt meet (cross-attention layers that relate the image representations with the prompts that describe them). The researchers found that by focusing on the ‘Transformer attention blocks’ of large-language models, fine-tuning quality with LoRA was on par with full model fine-tuning while being much faster and requiring less compute. (Researchers found it sufficient to fine-tune this part of the model to achieve good training).

Last updated