# Nemotron 340B, Training Data

On Friday, [NVIDIA](https://www.linkedin.com/company/nvidia/) released Nemotron 340B, an open LLM matching GPT-4 (0314).  They also released a Technical Report on how they trained it and what's special about it! \
\
&#x20;**Implementation**\
**Pretraining:** 2-phase pretraining, first trained on 8T and then continued on 1T higher quality tokens and Instruction data with a steeper slope of learning rate decay.\
\
**Fine-tuning:** First fine-tuned on 800K coding samples, followed by 200K diverse task samples.\
\
**RLHF:** Applied Direct Preference Optimization (DPO) followed by Reward-aware Preference Optimization (RPO) on multiple iterations.\
\
**Insights**\
🧪 **98% of data used in post-training was synthetically generated**\
🌍 **Pretraining data:** English data (70%), Multilingual data (15%), Source code (15%).\
🖥️ T**rained on 6144 H100 GPUs with 8-way TP, 12-way PP with interleaving and DP to achieve \~42% MFU**\
📈 Adjusting data distribution and learning rate decay in the 2 pretraining phase improves model quality.\
🧑‍🏫 Only used 20k Human annotated data mostly for Reward Modeling\
🎯 Focused on task diversity, topic diversity and instruction diversity\
🔄 Used an iterative approach for “response” generation starting with Mixtral then switching to Nemotron models\
🛠️ Detailed Synthetic Data pipeline instruction including all prompts to generate data<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://metaverse-imagen.gitbook.io/ai-tools-research/large-language-models-llms/open-source-llms/nemotron-4-340b-nvidia/nemotron-340b-training-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
