PixVerse - Next-Gen AI Video Synthesis

Video Synthesis tools powered by Artificial Intelligence and Machine Learning techniques are still evolving, but they are already having a significant impact on the movie production and animation industry.

AI Video Synthesis offers amazing benefits such as increased efficiency, enhanced creativity, accessibility, time and cost savings and many more. The fast emerging new tools are surely going to democratize filmmaking by making advanced visual effects more accessible to independent creators with limited resources.

Video synthesis has the potential to shakeup the entire movie production and animation industry, and in this Video, we will share one such tool, PixVerse. As most other tools in this category, PixVerse is right now sort of in its infancy but it is promising and is absolutely free to use. This means you can use it and experience and learn the Art of Video Synthesis Prompting and gain valuable insight into the world of AI Video Synthesis.

We think it would be fair to say that other tools such as PikaLabs, Genmo, RunwayML, LensGo and a few others are equally capable of creating stunning videos from scratch as well but their use may not be as liberal as PixVerse.

So Let us see how you can create some stunning, dynamic, beautiful and mesmerizing videos with PixVerse. First we’ll share some examples before we delve into the features of the tool.

In this example, you can you can see that the hair is well animated, how well it handles individual strains of hair billowing in the Wind.

This example shows us boats in Venice moving very smoothly through the ocean waters.

Here you can see some almost photorealistic flowers gently swaying in the breeze.

PixVerse can add subtle details to already beautiful images if an image lends itself effectively to animation and that is an absolute game-changer.

Here is an example of people working in eighteenth century industrial age factory and then an eighteenth century classroom.

Here is an example of an animated Pixar style image. Notice how the eyebrows relate to the moving eyes. It's not that the eyes are blinking and the eyebrows are static, but the eyebrows are moving in unison with the eyes.

Here you can see an image where there are static parts and moving parts.

Here again is a an excellent example of a beautiful image of this individual slowly moving away from camera and AI has managed to work out how the different objects in the scene are likely to be animated.

On their website there are more examples, such as a parallax animation of panning around an object centering it as the focal point.

The objects closer to the camera and the objects that are far behind move at different speeds creating a parallax effect.

You can view the depth and perspective of objects with this effect. The objects that are closer to the camera move faster than the objects that are far-away from the camera. This relationship between these objects feels very natural. It is not distorting the perspective or the relationship of each of the objects. It feels that these objects are on a consistent plane, which means that the spacing between them is not being changed.

Now let’s create a video using PixVerse since we have an idea about its capabilities.

Go to their website, Pixverse.ai. And you can log in using a Google account.

To create your Video, we have two options; the first is ‘text to video’, which is where you enter a prompt and get a video as output.

The second is ‘image to video’ option, where you can upload any image, even a photograph, and create a video from that image which acts as a seed (or guide).

Let’s try ‘image to video’ first.

Select an image of a person and upload it. Keep one thing in mind that the image file is a .png or .jpg file format. Your image does not have to be in a super high resolution as PixVerse downscales your image to around 1,000 pixels. But PixVerse can accept an image up to 10 megabytes in size.

So once we’ve uploaded an image, you can add some simple elements in the text box to include in the synthesized video such as; ‘wind blowing in the hair’ and ‘a panning shot’.

Next, you’ll need to tweak a couple of parameters. The first is the Seed parameter and that essentially defines the starting point of Randomness for your video.

Another parameter you need to set is the ‘strength of motion’.

Essentially here you define how much motion there will be in the image (or the output Video). The value for this parameter can be from 0.01 to all the way up to 1.0.

Note that if you set this value all the way down to 0.01, you will have no movement in the video or just negligible movement. So, set it somewhere in the middle like 0.50 and see the results.

You can experiment with different values for the ‘Seed’ and ‘Strength of Motion’ to get an idea of how the videos turn out with these parameters.

PixVerse has a very simple pipeline for creating Videos which is highly preferred in an AI Art generator. It is very likely that they are going to advance this interface with time and start to add more features where you can select what parts of the image are being animated.

Runway, a competitor of PixVerse already offers this feature with their ‘motion brush’ so we can expect PixVerse to catch up and offer this feature as well. And, more specifically, actually refine other Parameters related to Motion, Subject and Style.

While on the subject of Pipelines in the context of Video Synthesis and production tools, let us explain what a Pipeline is.

As mentioned earlier, almost all Video Synthesis tools have a Text to Video feature. This is where you will use prompt engineering and create a meaningful prompt – something that an AI Generative Video model will understand. Prompt engineering skill is something you must practice and eventually master as this is the ‘Language that AI Generative Models Understand’. This is their native language in a way.

Text to Video prompts are very similar to ‘Text to Image’ prompts, with one major difference and that is defining ‘Motion’, which is not required when generating Images.

There are five parts to an effective Video Synthesis prompt.

1. Subject

2. Landscape and Features

3. Composition & Relationship

4. Style

5. Motion

Let’s examine each of these so you can understand some of the options you have when creating Text to Video prompts that will include all five parts, that is Subject, Landscape and Features, Composition & Relationship, Style and Motion.

1. SUBJECT

This is the main part of the prompt. ‘What is it that we want in the Video Frame? For example you can have People, Animals and Objects.

People obviously means that you can have all ranges of humans from young to old, male female, and in an entire space of different situations.

Animals is used broadly to include birds, fish, reptiles or any other living creature.

And, Objects are like cars, motorbikes, a Waterwheel, Windmill, etc.

So these are just examples of ‘Subjects’ in a prompt.

2. LANDSCAPE & FEATURES

Subjects are placed in a landscape that may have features that you need to include. For example, is it a road in the mountains with a bridge, next to a waterfall, on a cloudy and windy day etc.?

3. COMPOSITION & RELATIONSHIP

Composition and Relationship is when you combine the Subject and the Landscape elements. You can consider how these set up. How are they positioned in the Landscape? What is their relationship either to each other or to the viewer? For example is it a very close-up shot. Is the subject far away? Is it walking, standing, sitting, stationary, etc. and are they relating to an object in any way.

So this is the simplest conceptual framework for understanding composition and relationship.

4. Style

This is essentially how something looks. It’s not what it is. It’s how it is portrayed. One of the most popular interpretations of this is of course ‘Realistic’. Generally most of us like to see a realistic reflection of an object or human but not always. ‘Realistic’ happens to be the most popular Style that people use, especially when starting out.

Other styles you can experiment with include Cinematic, Vintage, Anime, Pixar, Cartoon, Media (painted, sculpted, etc.)

For example, if you use ‘Cinematic’, you are adding a more cinematic, dramatic narrative on top of the image. Usually this involves lighting and the scene is made more alluring with the shot either at Dawn or at Dusk for example.

Cinematic essentially maintains a realistic look but adds a more stylistic tone. In your prompts, try experimenting with terms from the English dictionary like mesmerizing, captivating, enthralling, fascinating, spellbinding, hypnotic, entrancing bewitching, enchanting, riveting, alluring, etc. but be careful not to over-use them. Stick to no more than two at a time.

You can also try variations like a Vintage style for example. Beyond this, we can also imagine Animated Styles such as Anime or Pixar Style, and also try using ‘Media’. Media here means how is this created? What camera is being used if you are getting very specific with a Realistic approach. Or, you can even imagine if this is generated using different types of artistic mediums, for example painted, sculpted, drawn or whatever you like.

5. MOTION

Motion is the most crucial part in an AI Video Synthesis prompt. This is one of the keys to working with AI Video Synthesis. It is understanding what is being communicated within the image in relation to potential movement and animation.

There are two things to consider when injecting Motion into the prompt.

First, consider how the Subject is moving. Basically it is the verbs that you can use to describe the subject's movement. Are they running, walking, sitting, talking, swaying, dancing, etc. There are a lot more verbs that you can get from the English dictionary.

The other thing to consider, which is not how the subject is moving, but how the viewer is moving (that is how the camera is moving). This is an important concept to consider in your AI video, because it is how the viewer will perceive the experience.

You must learn to imagine that you're leading the viewer on a journey and you're placing them at the same position as the cameraman. This is the point of view that you can envisage.

So, you can consider either having a Static shot, which is where you're sitting, you're looking and everything that is happening inside of the shot. Or, you can consider that the camera is panning around, zooming in, zooming out, whether the camera is on a dolly and smoothly and gracefully tracking the subject, or perhaps it is flying over the top of the scene in a helicopter.

When defining MOTION, you need to combine the Camera terms (specifically Static, Panning, Zooming, Dolly etc.) and Subject Verbs (running, walking, sitting, dancing etc.) in the image and define the motion within it. You need to clearly articulate the direction and the nature of movement of different subjects and objects. Be forewarned that this is an incredibly difficult art, but it is not something that you won’t be able to master if you keep practicing.

In the world before AI Video Synthesis, learning this art took years of experience in the field of videography and movie making. Now it is as simple as trying out different iterations on your computers. THIS is mainly why we believe that AI Video Synthesis is going to eventually totally disrupt Hollywood in the next decade if not sooner.

Another important thing to remember when prompting is that too many small details in the prompt may overwhelm the system. So try to keep things brief, specific and to the point.

Also keep in mind is that Video Synthesis is extremely expensive because it is very GPU intensive. This means that you won’t get more than a 4 second scene from any AI Video Synthesis tool.

You do have the option to extend time by changing the speed at which the video is played back, but it can actually give you some varying effects as well – such as a feeling of slow motion once you slow the Video down. So you can actually expand the time out to 8 seconds by playing the Video at half speed. Again, this can work very well for a few different styles of video but for others it may not.

One advantage that PixVerse has over other AI tools is that you can upscale your output videos to a higher resolution of up to 4K, while other tools may require third party utilities such as ‘Topaz’ or ‘Pixop’ which are paid services. The upscaler at PixVerse actually compares extremely well to these other paid options in terms of quality.

So, to upscale your Video resolution, just click on ‘Upscale’ next to the image of the Video, wait for Upscaling to finish and then download the Video.

PixVerse also has the option to ‘retry’ which reruns the image with all of the same prompts and settings input. This is because each time you run it, you will get a different output for the same prompt.

When you use the Text to Video version, you simply enter in a prompt. You can add in negative prompts, which are the elements that you wish to discard inside of your work. These are things you do not wish to include.

After entering the prompt, you can tweak the Seed, the Aspect Ratio and the Strength of Motion. When you specify the Aspect Ratio, you are specifying the screen size of the output video. Most AI Video Synthesis allow Horizontal landscape video, a vertical portrait video, a square Video or a slightly rectangular Video which is 4 by 3 aspect ratio.

The Video generation time varies but generally is around about 30-60 seconds. It all depends on how busy the GPU’ servers are. As mentioned earlier, AI Video Synthesis is an extremely GPU intensive process and most companies don’t have thousands of GPU clusters - at least not just yet. Well, we think Apple, Meta, Google, NVidia and few others do have massive GPU Clusters (some containing thousands of GPU’s) but these companies have not rolled out their Video Synthesis tools so far even though they have announced Video Synthesis models behind the scenes.

And when they finally do, the landscape for Video Synthesis might change in terms of Video output Quality, Length of Synthesized Video etc.

One question that pops up frequently is that whether you can use the synthesized videos for commercial purposes. This is something that you must check for yourself to be on the safe side, so please confirm their terms of service before you use these tools for commercial use.

Finally, what can we expect in the future from PixVerse? First, there is no doubt it is not going to be free forever? It’s important that you take a look at it now because you can create as many videos as you like all the way up to 4K for free, and, importantly also become proficient in AI Video Synthesis concepts.

As mentioned earlier, most of AI Synthesis tools have similar features and have fairly standard pipelines.

What can we expect from the competition? The competition in the AI Video Synthesis domain is just beginning to heat up. Here is just a heads-up on some competitors – all vying for a share of the Global film, TV and Animation trillion dollar Industry;

PikaLabs, RunwayML, Genmo, Kaiber.ai, LensGo Ai, Prome-Ai, Moon Valley, and, of course some Mega players like Google with their MAGVIT, Video Poet, Lumiere and Imagen, Meta, the parent company of Facebook and Instagram with their Make-A-Video, and Emu Video tools, Apple with their Matr-yoshka Diffusion Models (or MDM) that is able to create longer videos.

And finally, NVidia and Amazon are collaborating on their own AI Video Synthesis tools – and we know that those tools are not going to be ordinary due to the fact that NVidia has a somewhat monopoly on GPU’s, and, Amazon has the Cloud Platform necessary to scale their Generative Video Services.

For the general public, the best is yet to come in AI Video Synthesis!

Please Subscribe to our Channel and Stay tuned so we can keep you informed on all these fascinating developments on the horizon.

So Thanks for watching and don’t forget to press the like button.

PreviousPixVerse NextDomoAI Video Transformation

Last updated 1 year ago