1. Evaluate Financial Costs involved

Estimate of Cost (Rent Compute Hardware vs. Purchase Compute Hardware)

First letโ€™s go over and understanding the financial costs (computational and other) involved. Weโ€™ll use a baseline LlaMA-2 Large Language Model from Meta;

LlaMA-2, the 7B (billion) parameter version required 180,000 GPU hours to train.

LlaMA-2, the 70B (billion) parameter model is 10 times as large. It took approximately 10 times as much compute, which amounts to 1.7 million GPU hours.

For our example, weโ€™ll assume LlaMA-2 10B model, a 10 billion parameter model takes on the order of 100,000 GPU hours to train.

At that rate a 100 billion parameter LlaMA-2model will take approximately 1 Million GPU hours to train.

So how can we translate these compute hours into Dollar amount.

There are two options that we can use;

Option #1

We can rent the Compute, or GPUโ€™s from any of the big cloud providers, such as Google, Amazon, Microsoft or others. The cost of an Nvidia A100 GPU card to train a LlaMa-2 type model is approximately $1 to $2 per GPU per hour.

So for a 10 billion parameter model, this cost is going to be on the order of $150,000 just to train. And for the 100 billion parameter model, the cost will be around $1.5 million to train.

Option #2

The second option is to buy the hardware. In that case we just have to take into consideration the price of these GPUโ€™s and the Host Hardware. So let's say an A100 is about $10,000 and we form a 1,000 GPUโ€™s cluster. The cost for this hardware alone are going to be around $10 million.

But that's not the only cost. When we're running a cluster like this for weeks it consumes a tremendous amount of energy. So we also have to take into account the energy cost. For example, training a 100 billion parameter model consumes about 1,000 megawatt hours of energy. Letโ€™s assume the price of energy is about $100 per megawatt hour. This means the marginal cost of training a 100 billion parameter model is going to be around $100,000. So when we sum this up, the total cost of training a 100 billion parameter LLM is approx. $10.1 million.

This enormous cost explains why just a handful of tech giants take on the risk of cranking out new LLMโ€™s so frequently. OpenAIโ€™s GPT-4, which is a 1.76 trillion model cost over $100 million to train. Not many companies can afford this.

Last updated