Nemotron 340B, Training Data
On Friday, NVIDIA released Nemotron 340B, an open LLM matching GPT-4 (0314). They also released a Technical Report on how they trained it and what's special about it! Implementation Pretraining: 2-phase pretraining, first trained on 8T and then continued on 1T higher quality tokens and Instruction data with a steeper slope of learning rate decay. Fine-tuning: First fine-tuned on 800K coding samples, followed by 200K diverse task samples. RLHF: Applied Direct Preference Optimization (DPO) followed by Reward-aware Preference Optimization (RPO) on multiple iterations. Insights ๐งช 98% of data used in post-training was synthetically generated ๐ Pretraining data: English data (70%), Multilingual data (15%), Source code (15%). ๐ฅ๏ธ Trained on 6144 H100 GPUs with 8-way TP, 12-way PP with interleaving and DP to achieve ~42% MFU ๐ Adjusting data distribution and learning rate decay in the 2 pretraining phase improves model quality. ๐งโ๐ซ Only used 20k Human annotated data mostly for Reward Modeling ๐ฏ Focused on task diversity, topic diversity and instruction diversity ๐ Used an iterative approach for โresponseโ generation starting with Mixtral then switching to Nemotron models ๐ ๏ธ Detailed Synthetic Data pipeline instruction including all prompts to generate data
Last updated