Nemotron 340B, Training Data

On Friday, NVIDIA released Nemotron 340B, an open LLM matching GPT-4 (0314). They also released a Technical Report on how they trained it and what's special about it! Implementation Pretraining: 2-phase pretraining, first trained on 8T and then continued on 1T higher quality tokens and Instruction data with a steeper slope of learning rate decay. Fine-tuning: First fine-tuned on 800K coding samples, followed by 200K diverse task samples. RLHF: Applied Direct Preference Optimization (DPO) followed by Reward-aware Preference Optimization (RPO) on multiple iterations. Insights 🧪 98% of data used in post-training was synthetically generated 🌍 Pretraining data: English data (70%), Multilingual data (15%), Source code (15%). 🖥️ Trained on 6144 H100 GPUs with 8-way TP, 12-way PP with interleaving and DP to achieve ~42% MFU 📈 Adjusting data distribution and learning rate decay in the 2 pretraining phase improves model quality. 🧑‍🏫 Only used 20k Human annotated data mostly for Reward Modeling 🎯 Focused on task diversity, topic diversity and instruction diversity 🔄 Used an iterative approach for “response” generation starting with Mixtral then switching to Nemotron models 🛠️ Detailed Synthetic Data pipeline instruction including all prompts to generate data

Last updated