LLaMA (Large Language Model Meta AI)

LLaMA (Large Language Model Meta AI) is a large language model (LLM) released by Facebook (Meta AI) in February 2023. A variety of model sizes were trained ranging from 7 billion to 65 billion parameters.

LLaMA's developers reported that the 13 billion parameter model's performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175 billion parameters) and that the largest model was competitive with state of the art models such as PaLM and Chinchilla.

LLaMA uses the transformer architecture, the standard architecture for language modelling since 2018. It also uses a number of other innovations, such as SwiGLU activation function instead of ReLU, rotary positional embeddings instead of absolute positional embedding, and root-mean-squared layer-normalization instead of standard layer-normalization.

LLaMA is designed to be a versatile foundation model that can be applied to a wide range of tasks, such as text generation, translation, summarization, and question answering. It is also designed to be accessible to researchers and developers, and is available as an open-source package.

The name "LLaMA" is a reference to the South American camelid, which is known for its intelligence and hardiness. The name also reflects Meta AI's goal of creating a language model that is both powerful and reliable.

Here are some of the potential uses of LLaMA:

  • Text generation: LLaMA can be used to generate text, such as news articles, blog posts, and product descriptions.

  • Translation: LLaMA can be used to translate text from one language to another.

  • Summarization: LLaMA can be used to summarize long pieces of text, such as articles or books.

  • Question answering: LLaMA can be used to answer questions about factual topics.

  • Code generation: LLaMA can be used to generate code, such as Python or Java code.

  • Creative writing: LLaMA can be used to generate creative text formats, such as poems, code, scripts, musical pieces, email, letters, etc.

LLaMA is still under development, but it has the potential to be a powerful tool for a wide range of applications.

The 7 billion parameter version of Llama 2 weighs 13.5 GB. After 4-bit quantization with GPTQ, its size drops to 3.6 GB, i.e., 26.6% of its original size.

Loading an LLM with 7B parameters isn't possible on consumer hardware without quantization. Even when only using the CPU, you still need at least 32 GB of RAM.

The amount of space required for LlaMA2 inference depends on the size of the model. The smallest model, Llama-2-10b, requires about 1.2 GB of space, while the largest model, Llama-2-70b, requires about 90 GB of space.

Here is a table of the space requirements for the different LlaMA2 models:

ModelParametersSpace (GB)


10 billion



20 billion



30 billion



40 billion



50 billion



60 billion



70 billion


If you are running LlAMA2 inference on a local machine, you will need to make sure that you have enough free space to store the model. You can also use a cloud-based inference service, such as Hugging Face Inference Endpoints, which will take care of storing and managing the model for you.

Last updated