LM Studio -Open Source LLM Installer

LM Studio Local Installation - Run Any Model on Laptop in Window

LM Studio Local Installation - Run Any Model on Laptop in Windows

Fahd Mirza ▶ Prompt Engineering 101 for Beginners ‘• Prompt Engineering 101 for Beginners

▶ Introduction to AWS Bedrock • Amazon Bedrock Introduction

▶ LM Studio https://lmstudio.ai

Transcript:

Hello I believe I have found out the easiest way to run LLMs on your local systems in Windows and that easy way is LM studio.

In this video I'm going to show you how you can download install this LM studio and then how run these LLMs which are open source from hugging face or even somewhere else on this LM Studio.

It’s very easy to configure and it's very easy to install and I believe these sort of products are going to shape up how we see generative AI going forward.

Let’s get right into it I am on their website already which is LM studio. As you can see it on my browser once you are there you can either download it for Mac or for Windows I am on Windows and this is my local laptop so let's start with the windows one so click here and then it will start downloading.

The size is also just 402 MB which is not huge considering the amount of work this will be doing for us so let's wait for it to finish.

While this happens you can see that um on their website they're telling us that you can run LLMs on your laptop entirely offline and then you can use that models through the inapp chat UI and or in the open AI compatible local server.

I'll show you shortly how to do that.

Our download is almost done let's wait for it to finish.

Let me open the file.

Let’s click on it. It’s going to open the installer which is simply the x file.

By the way I'm um my laptop has only one GPU so you can go with it or you can even install it in any of the cloud like on AWS EC2 instance or if you're using Google Cloud then maybe Google's virtual machine.

It is still loading let's wait okay so these are the release notes which you can read close it and then you can see that this is uh I think I opened two windows so let me close this one so that's it.

Let’s close this ‘x’ here.

I just bring it down and you can see that this is whole interface.

If you want to search any model all you need to do is to type here and it will search it.

There are some noteworthy models which is showing you Open Orca and all that stuff.

Let me play with the Mistral. Type Mistral and then press enter - it is searching and you can see that it already has given us lot of variants of Mistral 7B here.

As you already know I'm a huge fan of blocks so maybe I will use one of the blocks one. And there are lot of others too you can see that under 95 and then we have Nexus S and lot of these which you can search.

Now you can um search it with most downloads you can fil it by compatibility guess you can even connect to the hugging face Hub.

But let's not do it. Let’s go to the default so click on the Mistral 7B instruct it's already there and now on the right hand side it has the different available files for this let's use the Q5 because that is one of the fastest one this one and the size is also 5 GB.

So all you need to do is to click on download on the right hand side let's click here and you can see that it has started downloading in the uh bottom half of the windows.

Scroll down a bit. I think there you go. So let's wait for it to finish and it is telling you how many you can download and how many are completed so let's wait for this one to finish.

It is almost done. And as you can see that the model is downloaded now we can play around with it.

On the left hand side you can see this menu if you click on this chat icon the third one then on the top load your model which we have just downloaded Mistral instruct and it is loading the model and then once it is downloaded you can start chatting with it on the right hand side there are various sort of defaults or configurations.

If I click on model configuration it is telling us.

What are the inference parameters, which is Randomness and lot of other things which we already aware of and I have done few videos around it?

Then there are also some Hardware settings. that if you want to enable GPU acceleration

What is your pre-prompt?

System prompt.

You can configure it your prompt format or template.

You can also select your own here.

So now let's chat with our Mistral model which has been downloaded. And if you want to remove it you can simply click on eject model here.

The first time it might not show you the chat one so for that just click here and it will drop it down the bottom window and then you can type your message here.

I will ask it what the capital of Australia is and let's wait for it. It is talking to your local model right now and we don't have to worry about downloading or you know worrying about your hosted model.

So there you go and you can see in real time it is telling you your CPU consumption at the top and your RAM usage which is amazing and the answer is perfectly right.

I will give it another bit of a complex task here. So let me give it a scenario one you're an expert on bathrooms. I have a 25yr old house with an old bathroom I want to renovate it completely think about it step by step and give me steps to renovate it let's see and I'm also asking it to give me the cost of every step in Australian dollars so let's see what it does it is processing it and again you can see your resource consumption at the top amazing stuff I mean really impressed with this and within 20-25 second it has already started producing the output.

And you can see it is bit slow as it is um I think my memory is still bit low but still it's a laptop and you would imagine that this would happen.

Now let's imagine you have your own company server you might have put it your uh in your data center or in the cloud and you have installed this LM Studio, you are hosting this LLM there, and then the whole company is using it from that.

And I'll show you how you can do that too shortly. Amazing stuff. So you can see that it is giving me all the steps and it is giving me the cost too with every step. And if you want to stop this inference and output generation you can simply click here on stop generating.

So let me stop it now.

Let’s ask it another question. Write a Python program to reverse list. Press enter. Let’s see what it does now. Keep the video running just to show you the response time so you can see within 5 Seconds it was able to return me the response and let's see how accurate that response is there you go.

It is writing us a program which seems pretty correct to me. Yes minus one that's cool awesome so that's a correct one.

If you want to start a server so that other people would be able to access this model through your LM Studio server or you need to do is to on the left hand side click on this double headed arrow and then you already have your model loaded at the top and then this is our local inference server. And this is our server Port 1 2 3 4. So whatever your IP or FQDN or your server name is colon 1 2 34 will be your Port.

Just click here start server and it has started the server and you can see that it is saying that HTTP server is listening on board 1 2 3 4. And you can access it through your Local Host.

You can also replace this Local Host with your IP or with your uh FQDN (Fully Qualified Domain Name) or server name. (A fully qualified domain name, sometimes also referred to as an absolute domain name, is a domain name that specifies its exact location in the tree hierarchy of the Domain Name System.)

So if you want to do the inference on this all you need to do is simply either use the Curl command where you're are specifying your host and then if you want to do chat completion or if you want to just pass on the prompt you can do this in in your API and there are other um endpoints too like this with a ‘get method’ or with ‘post method’ for chat completions or just for simply for completions.

And if you want to check the logs they are here in the temp directory.

Let me also show you the log. This is my server temp directory where I have the log so if you open it these are all the logs so you can troubleshoot.

If you want if there is some issue and um this comes very handy if you have multiple models running and if you have server running too and if you want to stop it just click on stop and it will stop it if you want to look at all of your models just click on this last option of the folder icon and there you have all of them in the in one place and you can filter them out and then you can play around and you can even delete it from here just to save cost.

To know more about it like their GitHub their Discord. I would highly suggest that you go there for a lot of cool information.

I would want to reiterate, it supports all the Lama, Falcon, MPT, star coder, rep GPT new models on hugging face.

This list will keep growing I'm more than sure so pretty cool I would say very impressed by it and make sure that if you have installed it you always check for updates whenever uh periodically so that you would stay up to date.

I hope that you loved it if you are stuck somewhere if you need any more clarification please put it in the comments and if you like the content please consider subscribing to the channel thank you very much

Last updated