# LLaMA (Large Language Model Meta AI)

**LLaMA (Large Language Model Meta AI)** is a large language model (LLM) released by **Facebook (Meta AI)** in February 2023. A variety of model sizes were trained ranging from 7 billion to 65 billion parameters.&#x20;

LLaMA's developers reported that the 13 billion parameter model's performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175 billion parameters) and that the largest model was competitive with state of the art models such as PaLM and Chinchilla.

LLaMA uses the transformer architecture, the standard architecture for language modelling since 2018. It also uses a number of other innovations, such as SwiGLU activation function instead of ReLU, rotary positional embeddings instead of absolute positional embedding, and root-mean-squared layer-normalization instead of standard layer-normalization.

LLaMA is designed to be a versatile foundation model that can be applied to a wide range of tasks, such as text generation, translation, summarization, and question answering. It is also designed to be accessible to researchers and developers, and is available as an open-source package.

The name "LLaMA" is a reference to the South American camelid, which is known for its intelligence and hardiness. The name also reflects Meta AI's goal of creating a language model that is both powerful and reliable.

Here are some of the potential uses of LLaMA:

* **Text generation:** LLaMA can be used to generate text, such as news articles, blog posts, and product descriptions.
* **Translation:** LLaMA can be used to translate text from one language to another.
* **Summarization:** LLaMA can be used to summarize long pieces of text, such as articles or books.
* **Question answering:** LLaMA can be used to answer questions about factual topics.
* **Code generation:** LLaMA can be used to generate code, such as Python or Java code.
* **Creative writing:** LLaMA can be used to generate creative text formats, such as poems, code, scripts, musical pieces, email, letters, etc.

LLaMA is still under development, but it has the potential to be a powerful tool for a wide range of applications.

The 7 billion parameter version of Llama 2 weighs 13.5 GB. After 4-bit quantization with GPTQ, its size drops to 3.6 GB, i.e., 26.6% of its original size.&#x20;

Loading an LLM with 7B parameters isn't possible on consumer hardware without quantization. Even when only using the CPU, you still need at least 32 GB of RAM.

The amount of space required for LlaMA2 inference depends on the size of the model. The smallest model, Llama-2-10b, requires about 1.2 GB of space, while the largest model, Llama-2-70b, requires about 90 GB of space.

Here is a table of the space requirements for the different LlaMA2 models:

| Model       | Parameters | Space (GB) |
| ----------- | ---------- | ---------- |
| Llama-2-10b | 10 billion | 1.2        |
| Llama-2-20b | 20 billion | 2.4        |
| Llama-2-30b | 30 billion | 3.6        |
| Llama-2-40b | 40 billion | 4.8        |
| Llama-2-50b | 50 billion | 6          |
| Llama-2-60b | 60 billion | 7.2        |
| Llama-2-70b | 70 billion | 90         |

If you are running LlAMA2 inference on a local machine, you will need to make sure that you have enough free space to store the model. You can also use a cloud-based inference service, such as Hugging Face Inference Endpoints, which will take care of storing and managing the model for you.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://metaverse-imagen.gitbook.io/ai-tools-research/ai-technology/generative-ai-architectures-and-models/generative-ai-and-llms-for-text/llama-large-language-model-meta-ai.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
