Nemotron 340B, Training Data

On Friday, NVIDIA released Nemotron 340B, an open LLM matching GPT-4 (0314). They also released a Technical Report on how they trained it and what's special about it! Implementation Pretraining: 2-phase pretraining, first trained on 8T and then continued on 1T higher quality tokens and Instruction data with a steeper slope of learning rate decay. Fine-tuning: First fine-tuned on 800K coding samples, followed by 200K diverse task samples. RLHF: Applied Direct Preference Optimization (DPO) followed by Reward-aware Preference Optimization (RPO) on multiple iterations. Insights ๐Ÿงช 98% of data used in post-training was synthetically generated ๐ŸŒ Pretraining data: English data (70%), Multilingual data (15%), Source code (15%). ๐Ÿ–ฅ๏ธ Trained on 6144 H100 GPUs with 8-way TP, 12-way PP with interleaving and DP to achieve ~42% MFU ๐Ÿ“ˆ Adjusting data distribution and learning rate decay in the 2 pretraining phase improves model quality. ๐Ÿง‘โ€๐Ÿซ Only used 20k Human annotated data mostly for Reward Modeling ๐ŸŽฏ Focused on task diversity, topic diversity and instruction diversity ๐Ÿ”„ Used an iterative approach for โ€œresponseโ€ generation starting with Mixtral then switching to Nemotron models ๐Ÿ› ๏ธ Detailed Synthetic Data pipeline instruction including all prompts to generate data

Last updated