Parameters vs Tokens in LLMs?

Parameters vs Tokens in LLMs?

Parameters vs Tokens in LLMs?

What are parameters vs tokens in LLM?

Parameters are essential for the model's ability to understand and generate human language. Tokens: Tokens, on the other hand, are the discrete units into which text is divided. In English, tokens can be as short as individual characters or as long as entire words.

What is the difference between training tokens and parameters?

Tokens are chunks of text that AI learns from, ranging from a single character to a whole word. They help the AI understand context and semantics. So, while parameters are like the players in a rugby team, tokens are like the different plays and strategies they use.

Parameters vs Tokens

"What are parameters?" and "Why are they important?" Well, parameters are like the spices in your grandma's secret recipe; they add flavour and complexity to the model's responses. They're the building blocks of the model, fine-tuning its capabilities. The more you have, the smarter and quicker the model is. But it's not just about quantity; it's about how these parameters interact and learn from the data they're fed. It's like a rugby team; you need both skilled players and good teamwork to win.

But hold on, what about tokens? Tokens are chunks of text that AI learns from, ranging from a single character to a whole word. They help the AI understand context and semantics. So, while parameters are like the players in a rugby team, tokens are like the different plays and strategies they use. Both are crucial for winning the game.

Running Locally

Running local LLMs like Hermes GPTQ and LlaMA 2 Chat GPTQ have become rabbit holes as of late. You get to keep all your data in your own backyard, plus you've got the freedom to tweak the models to your heart's content. But, before you dive in, there are a few things you should know.

First off, these models are memory-hungry. For example, if you've got a model with 1 billion parameters, you're looking at needing around 2GB of memory for 16-bit precision.

If you're running it on a collab with a V100 GPU, you can handle a 16-bit model with up to 7 billion parameters.

If you gave a A100 machine with 40GB memory? Then you can go up to 16-bit with 13 billion parameters. Anything bigger, and you'll need a GPU with more VRAM or even a cluster of GPUs.

Now, if you're thinking of offloading some of the work to your CPU to run bigger models, just know it's going to slow things down. Even the flashiest CPU won't give you more than 5 tokens/sec for a 7B model in INT8 Quantised form. So, you have to weigh up the pros and cons there.

Last but not least, keep an eye on your model loader, especially if you're using quantised models. AutoGPTQ and GGML are the popular choices, and if you pick the wrong one, you'll either fail to load the model or end up with some proper nonsense output. And if you're keen on fast inference, you can use a Triton backend, but make sure you load the model with the correct backend, or you'll get gibberish.

Local LLMs like Hermes and LlaMA 2 offer heaps of control and flexibility, but you've gotta be savvy about how you set them up. That's where private clouds could come into play?

Llama 2 vs others?

When comparing Llama 2 to other giants like GPT-4 or Gopher, it's crucial to look beyond just the number of parameters. Llama 2 might have billions of them, but its real strength lies in how it uses them. Llama 2 is resilient and doesn't easily get tricked into saying anything inappropriate, making it a strong contender in the LLM arena.

Evaluating an AI model isn't as simple as counting parameters or even tokens. It's about understanding the whole recipe. Consider its performance stats, its scaling abilities, and even its ethical and environmental impacts.

Window.ai

Now, let's talk about Window.ai, 'cause this tool is bomb as. With Window.ai, you can run models like Llama 2 on your own intranet, giving you more control and privacy. Plus, it eliminates the need for model API costs and rate limiting.

Last updated