Mistral 7B Installation on Windows Locally

Mistral 7B Installation on Windows Locally

Fahd Mirza

▶ Mistral7B https://mistral.ai/news/announcing-mi...


Hello guys. In this video I'm going to show you the most easiest quickest and feasible way to install Mistral 7B on Windows on your local laptop.

On your screen you can see that I'm am on Mistral 7B model card.

if you are not aware what Mistral 7B is I have various videos on it where I go in detail in this model but just to give you a quick overview, Mistral 7B is by far the best 7 billion model to date.

This model has already shown that it outperforms Lama 2 on 13 billion on all benchmarks plus it has also outperformed Llama-1, 34 billion on many benchmarks.

it is use it is using Grouped Query Attention or GQA for faster inference plus it also uses sliding window attention to handle longer sequences at small cost and I have seen it in various benchmarks while using this model that it in fact is quite cost efficient okay now we know what Mistral 7B is now let's see what is the easiest way to get it installed locally on your windows on your laptop or anywhere in cloud, wherever you like.

the tool which I'm going to use for it is called as LM Studio I already have done few videos around it and I'll drop the Link in video's description so simply go to LM studio. From there click on download LM Studio for Windows and then it will start downloading as you can see on the top right.

The size of this LM studio is around 400 MB which is very small.

The beauty of this LM studio is that with this you can very easily in download and then run model on your laptop.

It uses the quantized versions of the models and you can it gives you a drop down which I will show you shortly which you can easily use.

Let’s wait for it to finish shouldn't take too long it's almost done.

The download is finished and in order to run it, simply click on open file and this is going to open it on your local system.

Now you need to just search for Mistral 7B in this text box and then you can see that there are a lot of Mistral 7B variants like Instruct 7B, Open ORCA 7B etc.

If I select Mistral 7B GGUF or select open Orca 7B and the list goes on. I'm just searching for something with GPTQ or GGUF. (GGUF stand for 'GPT-Generated Unified Format', a file format designed specifically for storing and running large language models (LLMs) for inference tasks)

So let's go with TheBloke's Quantized version of Mistral 7B.

Once you select it on the right hand side you can see the available file for it and there are lot of quantized version variants here. I normally try to go with Q5 but as you can see that it is grayed out so it might not be supported because you can see that it is saying possibly supported. so it's not sure so we need to check it out so for this let's go with Q5KS to see if it works or not.

(QKSM: In machine learning, QKSM can stand for "Query, Key, Softmax Mixture." This is a technique used in attention mechanisms, which are a key component of many LLMs. QKLM: Similarly, QKLM could stand for "Query, Key, Linear Mixture." This is another attention mechanism variant. 5KSM: This could be an abbreviation for a specific training parameter or dataset.)

In order to download it, all you need to do is to click on this download button - you can see that o it has started downloading the file size is around 5GB.

Once download, select the model to load and then click here to load the model.

You can see that now the model is being loaded on our local system. The model is loaded on your local system.

Now just drag this bottom section to downward by clicking on it or just dragging and dropping to the bottom and there you go now you can chat with your Mistral model.

It is downloaded and installed. So let me ask it what the capital of Australia is. now this response time will depend on the capacity of your system how much memory you have and all that stuff as the cool thing about this is that you can see on the left top left it is showing you the CPU consumption and the memory usage and you can see my CPU consumption is 320%. So I definitely need a BP system. But still it hasn't crash and it was able to give me a result which is spot on and correct.

On the right hand side you can also play around with model configuration. there are some Inference parameter like what sort of repeat penalty you want to give which is 1.1 which is fine and then Randomness which is temperature and lot of other things like Prompt Format model initialization and stuff and I'm using mlog to keep entire model in Ram which is a default.

all good you can even select on T size which is again a default. so you can play around with all of these parameters easily I have another video where I go in way more detail what this parameter what these parameters mean.

Let’s give it another run let me copy it from my system and then paste it to save time.

I'm going to ask it to play a role of bathroom renovator and I'm giving it a scenario that I have a 24 year old house with an old bathroom and I want to renovate it.

This model needs to think it step by step and give me the steps to renovate it and then also give me the cost in Australian dollars.

So let's see how it goes.

I'm not expecting it to give me the latest cost because it's not fine-tuned on the latest data. so I'm not sure how old the Corpus of data is for this model but still I just want to see if it is able to do the inference here because when I tried it out on my notebook and Linux instance it was able to do it quite nicely.

So let's wait for it to come back and here you can see that it is processing and now generation is about to start.

You can see that there's some lag because of the way my system is configured that is still stuck with the previous one.

I'm just going to say stop generating and I'll say continue and then now let's see what it does.

I'm not sure if it this is the bug or what because sometime it just does this where it just answers the previous question so we need to sort of stop it and regenerate or click on continue so that it would go to the next one.

You can see that it has started splitting out the response quite slow because of the limitation on my system this is not the fault of model or this software. I just need a beefy laptop but you can see that it has started doing it so let's wait for it to print something more to see if it is able to give me the cost of steps or not.

You can see that it is successfully able to decipher my question and it is giving me not only the steps but also the cost. Let’s click on stop generating.

This is it guys this is how easily you can download and install and then run this Mistral 7B.

Before I close this video let me tell you another great feature of it. If you click on this double-headed arrow on the left hand side you can even start local inference server and then share the endpoint with your user.

So that instead of logging into this server to use it, they can simply make an API call like this. This is a restful API where they are just making a call to this endpoint and then passing it the prompt and then it will return them the answer. How cool is that. So you can build any application on top of this. So very impressed by this and I have various videos on it where I discussed lot of other things related to this.

Last updated