# Falcon 180B

&#x20;

Falcon 180B: The Largest Open Language Model Surpasses Llama 2 and GPT 3.5

·        *6 September 2023*

The Institute of Technological Innovations from the UAE has unveiled Falcon 180B, the largest open language model, displacing [Llama 2](https://neurohive.io/en/state-of-the-art/llama-2-and-llama-2-chat/) from the top spot in the rankings of pre-trained open-access language models by HuggingFace.

The Falcon 180B model

* Trained on 3.5 trillion tokens using the [RefinedWeb](https://arxiv.org/pdf/2306.01116.pdf) dataset.
* 180 billion parameters, which is 2.6 times more than the previous leader, LlaMA 70B,
* Requires 8 Nvidia A100 GPUs and 400GB of space for inference.

You can [test the model on HuggingFace](https://huggingface.co/spaces/tiiuae/falcon-180b-demo), and the [model’s code](https://huggingface.co/tiiuae/falcon-180B) is also available there.

### Model Architecture

Falcon 180B, a fine-tuned version of Falcon 40B, utilizes a multi-query attention mechanism for enhanced scalability.

The conventional multi-head attention scheme features one query, key, and value for each head, whereas the multi-query approach uses a single key and value for all “heads.”

![](https://1581258177-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8ErV6i18O2fXwbB9tnf5%2Fuploads%2FsBv6A38SeeSqvC0bQZTZ%2Fimage.png?alt=media\&token=9bd4caf0-9496-4f3a-bf91-f8c34fef1e81)

\
The Falcon 180B model was trained on **4096 GPUs, which took approximately 7,000,000 GPU hours on Amazon SageMaker**.&#x20;

Compared to LlaMA 2, training Falcon 180B required four times more computational power.

&#x20;![](https://1581258177-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8ErV6i18O2fXwbB9tnf5%2Fuploads%2FGSob5pzfo15DTAZf9V0g%2Fimage.png?alt=media\&token=0e57f876-3fbf-4b25-892d-d53fc808cdf0)

The dataset for Falcon 180B primarily comprises web data from the **RefinedWeb dataset (approximately 85%)**. Additionally, selected data, including dialogues, technical articles, and code, were used, making it a versatile model for NLP tasks (around 3%).

Falcon 180B outperforms Llama 2 70B and GPT-3.5 from OpenAI on the MMLU benchmark, although it falls behind GPT-4. It also competes successfully with Google’s proprietary PaLM 2-Large on benchmarks such as HellaSwag, LAMBADA, WebQuestions, Winogrande, PIQA, ARC, BoolQ, CB, COPA, RTE, WiC, WSC, and ReCoRD:

&#x20;**Comparison of the PALM family of models and Falcon 180B**

&#x20;![](https://1581258177-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8ErV6i18O2fXwbB9tnf5%2Fuploads%2FCg2y9xa737OfyPp6OoFi%2Fimage.png?alt=media\&token=78ab4f28-79de-4782-b642-bedac5d27572)

Although Falcon 180B is available on the Hugging Face Hub, its commercial usage is highly restricted. It is advisable to review the license and seek legal counsel for commercial purposes.<br>
