Step 2: Training your LoRA Model Dataset

Step 2: Training your LoRA Model Dataset

Use the link above to go to the kohya-LoRA trainer on Google's Colaboratory platform.

The link is for a Colab Notebook for LoRA Training (Dreambooth Method).

(A "notebook" on Google Colab is an interactive, web-based document that enables users to create, run, and share code, analysis, and visualizations through a user-friendly interface. It has Pre-installed libraries and packages, making it easier to get started with various data science and machine learning tasks without having to set up a local environment.

Google offers free access to limited GPU and TPU resources, which can significantly speed up computations, especially for machine learning and deep learning tasks. Moreover, it has seamless integration with Google Drive, allowing users to store, share, and collaborate on notebooks with others.)

Once you are on the site, make sure that you are logged into your Google Account and then click on ‘connect’.

Now we can start the training the images.

1.0 Install Kohya Trainer

Section 1.1. Install Dependencies

All we need to do under is Check the ‘Mount Drive’ button and run that 'cell'.

(Note: In the context of artificial intelligence (AI), the term "Cell" typically refers to a Unit within a Neural Network or a Recurrent Neural Network (RNN). These Cells or Units are responsible for processing and transforming input data, and then passing the results to other Cells in the network. Cells can also store information, which is crucial for certain types of neural networks, such as Recurrent Neural Networks (RNNs), that need to maintain state across time steps.)

This is going to ‘Mount’ your Google Drive. Click ‘connect’ to Google Drive. It is going to give you some warnings about connecting to an unknown-Google offered notebook if its running.

Once the cell completes its task, you'll get a ‘Green’ checkmark next to it. This downloads all the dependencies into your file structure. Remember this process is remote and not on your PC. So once you exit the notebook, you'll lose everything.

Once connected, we can pull files across from our Google Drive. And once we've finished training the model, we can take the ‘Trained model’ and Export that back to our Google Drive.

Section 1.2 Start File Explorer

Ignore Section 1.2. This is special file Explorer and we don't need that.

2.0 Pretrained Model Selection

Section 2.1. Download Available Mode

The default setting is for ‘AnyLoRA’ or ‘Anything-v3-1’. ‘Anything-v3-1’ is more of an ‘anime style’ model. (Anime is a style of animation originating in Japan that is characterized by stark colorful graphics depicting vibrant characters in action-filled plots often with fantastic or futuristic themes).

When you click on the Option, there's a few more preloaded links in the dropdown.

For the SD1.x model option, select the ‘Stable Diffusion 1.5’ model. This is a good base model as it is an all-rounder training file to use for realistic characters. Obviously you can use whatever you want, but if you're starting off this, is a good one to learn with.

Section 2.2. Download Custom Model

Here you can actually load in your custom model. Which means you can go to the HuggingFace link ( and get the HuggingFace link for your model and place it in there and run that cell. But we will ignore this for this tutorial. So we will leave the space in SD2.x model blank. We're not going to be using that.

Ignore Section 2.2. We’re not going to be running our own custom model.

(Note: HuggingFace is the AI community with over 5,000 organizations as members. The community Build, Train and Deploy SOA (State Of the Art) models powered by the reference open source in machine learning).

Section 2.3. Download Available VAE (Optional)

Sometimes you might notice when you switch models in Automatic1111 or whatever program you're using, the images are really de-saturated and lost their colors. That’s because the VAE is not being detected or has been corrupted.

We are going to load in the Stable Diffusion 1.5 vae, which is already there. It is and that's going to download it all into our file structure. (VAE stands for Variational AutoEncoder in Stable Diffusion- and it is one of the four well-known Deep Learning-based Image Generative Models).

And then ‘run’ the cell.

3.0 Data Acquisition

Section 3.1. Locating Train Data Directory

This is going to create some file path in train_data_dir to where our file is and that's where all our input Dataset images will be. All our images will be going into that folder. We don't have to do anything here – it is done automatically for us.

Run this cell once again and it will show your train_data_dir, which you can browse. It will have a folder 'LoRA'. If we go expand LoRA it will have sub folders; config, reg_data and train_data. Do not drag and drop your images into here. It’s going to be done automatically. (reg_data is the regularization data)

Section 3.2. Unzip Dataset

zipfile_url: This is why we ‘zipped’ our file and put it onto our Google drive because we're now going to grab it. If you go to ‘Drive’, this is essentially your Google Drive. MyDrive. And in here you can find the .zip file which is;

/content/drive/MyDrive/ Click on the three dots there and copy path and then to paste in the zipfile_url.

Leave the unzip_to: blank. Close this hierarchy here so we can see what we're doing. And you can see it's extracted all my files from that zip file into that new folder.

Section 3.3. Image Scraper (Optional)

'Image Scraper'; We won’t be using this. This is needed and based on more or less 'anime databases'. What Image Scraper does is it Scrapes regularization images, which we won't do because we are not using anime. But if you are using anime you can do it here. (Note: ‘Image Scraping’ is a subset of the web scraping technology. While web scraping deals with all forms of web data extraction, image scraping only focuses on the media side – images, videos, audio, and so on. Grepsr Concierge and Grepsr Browser Extensions are two commonly used tools for image scraping).

4.0 Data Pre-Processing

Section 4.1. Data Cleaning

Data cleaning section has to do with the cell above it. If you are scraping all these images you might not be aware what they actually are. It will delete unnecessary files and unsupported media such as .mp4, .webm and .gif. If the image has a transparent background that's very hard for machine learning and you want to convert that and also random colors as well. Set the convert parameter to convert your transparent dataset with an alpha channel (RGBA) to RGB and give it a white background. So you only check that if you're doing the ‘anime’ and scraping the images which we are not.

Section 4.2. Data Annotation

Section 4.2.1. BLIP Captioning

We will be using 'Blip Captioning' which is tagging images with a description. This is used for realistic imagery. The one below it which is the ‘Waifu Diffusion’ is used more for anime. We are just going to be using ‘Blip Captioning’. We will leave these settings as default and run that cell.

What that will do is read the input images that we put into the Google Collab. It’s then going to describe what it sees in the images. It is going to describe for example ‘the lady has a microphone’. The fact that ‘she is wearing a necklace’, or ‘potentially a brown top’. This means it's not going to train upon the things described, which makes it so much easier to create a much more flexible model when we are generating the images later on.

To illustrate this, what this means, let’s see files it created. If you come into your files LoRA and then expand on train_data. You can see it has generated these caption files here.

Let’s just pick any image here, eg. 12.png, and you can see the lady has got a microphone and a necklace. So it should pick up on those attributes. Click on the caption that goes along with that image. And yes it said; 'a woman sitting in a chair holding a microphone'. We can actually add on to this and add 'necklace' if we didn't want it to train on the necklace that the lady was wearing. You can edit these further if you want to, but for this tutorial we are going to leave it as is.

Section 4.2.2. Waifu Diffusion 1.4 Tagger V2

We will ignore the ‘Waifu Diffusion Diffusion Tagger’ as that's for anime.

Section 4.2.3. Custom Caption/Tag:

We are going to ignore the ‘Custom Caption/Tag’ as well. This creates a text file caption which we will ignore.

5.0. Training Model:

Now we go on to training the model.

Section 5.1 Model Config:

If you are using Stable Diffusion version 2.0 to train your model, you need to check these two boxes ‘v2:’ and ‘v_parameterization:’ boxes. Since we are using Stable Diffusion 1.5., we are going to leave those unchecked.

Under the ‘project_name’, give your project name that you will remember. We are going to call ours 'sample_tutorial' and then underneath it you've got ‘pre-trained_model_ name_or_path’: So we need to change this to my Stable Diffusion trained model. (copy path and paste).

trained_model_ name_or_path: /content/pretrained_model/Stable-Diffusion-v1-5.safetensors

We downloaded all these dependencies in the first cells. This would have made a pretrained_model folder for you. If you expand that folder, you will see your ‘safetensors model’. If you go to the three dots click on it, copy path and simply just paste that in there.

Next, we have the vae. The vae file controls the color in your images. We also need to grab that and that would have installed during the first cell as well. So that will be in the vae folder. Repeat that process; copy that path and paste it in there.

vae: /content/vae/

This is where it's going to be saving your finalized models. We say models, because it's going to create multiple versions.

Once you've closed the Google notebook all these files will disappear make.

Make you check ‘output to drive’ and that will save the files to your Google Drive and then just run that cell.

You can the output path is /content/drive/MyDrive/LoRA/output. There will be a folder on your Google Drive called LoRA and it will be in the output file.

Section 5.2. Dataset Config:

These are the most important settings..

We want to keep the train_repeats to 10 and reg_repeats to 1.

We have got the instance_token. We are just going to keep our at mksks. By associating mksks with our model, it knows it's calling up our model, our images. Just keep it as mksks.

Next one below is class_token; We are not training a style. We are training a ‘woman’ or you can enter ‘person’. It's up to your own discretion.

Under General Config, we have ‘resolution’. We are going to do 512 x 512 because we have input images that are 512 by 512. If you're doing 768 by 768, enter 768 by using the slider.

Leaving all these settings as default and then run the cell.

Section 5.3. LoRA and Optimizer Config:

You really need to experiment with these settings yourself to see if you can get a better result because obviously you will be training different images than this tutorial. But for this tutorial, we will use these settings because they are pretty standard.

Next one below is convolution_dim, which we will set to quite low at 8 and the convolution_alpha we will set at 1.

Next one down is network_dim, which we set at 16 and we will change the network_alpha to 8.

These settings actually a have huge influence on your model. What settings might work for you will depend on different training sets, different resolutions and all that.

Next, leave the optimizer_type under Optimizer Config as 'AdamW8bit'.

Then we need to specify learning_rate for unet_lr: and for text_encoder_lr:

The learning rate for unet_lr should be set to 5e-4.

The learning rate for text_encoder_lr should be set to 1e-4.

The learning rate for lr_scheduler should be set to ‘cozine_with_restarts’.

For the warm up steps, we will use 0.0.05. This is probably going to be a learning rate of about 950 steps. But we'll see once we run this cell.

Now run that cell.

Section 5.4. Training Config:

Leave low RAM on (lowram).

We will enable_sample_prompt.

We will leave the sampler as ddim.

Noise_offset: Leave this at 0.0. Sometimes we set it to 0.01.

Num_epochs; We are going to creating 10 epochs. This will save a file at every learning stage. This means we can test the files out in our web UI at the end to see if it's either undercooked or overcooked or just about right. We like to do about 10 because it gives us a nice diverse range to pull from.

Train_batch_sizes; You can set this to as low as 1. We are going to go with 2 and see how it goes from there. The batch sizes is how many image files it is training together. If we are training 6, it's going to be a lot quicker than it will be for 2.

If we went with 1, then we would probably completely run out of RAM. So if you do have a RAM issue, try sticking to 6 or higher. But if you don't have any RAM issues whatsoever, you can train on anything as low as 1 here.

The mixed_ precision and save_precision, we’ll both leaving those at fp16 and our epochs save_n_epochs_type. So we should have 10 epochs at the end. save_n_epochs_type_value is 1. We will save the model as a 'safetensors' model and leave this setting as default.

Now run that ‘cell’.

Section 5.5. Start Training:

And now we come to our final ‘cell’. All you need to do here is just run that cell and leave everything as default and let the training commence.

This might take about 30 to 40 minutes. If we wanted it to be done quicker, we would actually increase the 'batch size'.

Once the training is complete, you do not have to do any of the remaining cells in the notebook. Your files will now be saved automatically into your Google Drive.

Go to your Google Drive. You will have a LoRA file there with your LoRA files in it. And remember it would ‘Save a file at every training step (epoch)’.

Since we said 10 epochs for this tutorial, It will give us 10 files here.

Last updated