Snowflake Arctic 128 Experts MoE

Snowflake Arctic 128 Experts MoE

Snowflake wants to give back to the community enrich the collective knowledge Empower others to succeed and with this release they're not just unveiling the model they're also sharing the research insights through a comprehensive cookbook and this Cookbook is designed to expedite the learning process for anyone looking to build Worldclass Moe models.

Snowflake's new open-source large language model called Arctic, which uses a novel architecture called a "dense hybrid Transformer" with 128 experts (smaller models). This approach, called a Mixture of Experts (MoE), is claimed to provide several benefits:

  1. Training efficiency: By utilizing many small "expert" models instead of one large model, the training can be made more computationally efficient and less expensive. The article states that Arctic's training cost was under $2 million, much lower than estimates for models like GPT-4 ($60 million).

  2. Model performance: Despite using smaller expert models, the combination of 128 experts allows Arctic to achieve high performance on enterprise tasks like coding, SQL generation, and instruction following - what Snowflake calls "Enterprise intelligence".

  3. Scalability: Having many smaller expert models makes it easier to scale up the overall model size and capabilities by adding more experts, compared to scaling up a single large model.

  4. Specialization: Each expert can potentially specialize in specific tasks or domains, allowing the overall model to handle a diverse set of tasks effectively.

The key innovation claimed is that Snowflake's dense hybrid architecture reduces the communication overhead between the experts during training, which has been a major inefficiency in traditional Mixture of Experts approaches. This enables training very large MoE models like Arctic's 128 experts in a cost-effective manner.

Last updated