PixVerse - Next-Gen AI Video Synthesis
PixVerse - Next-Gen AI Video Synthesis
PixVerse - Next-Gen AI Video Synthesis
Video Synthesis tools powered by Artificial Intelligence and Machine Learning techniques are still evolving, but they are already having a significant impact on the movie production and animation industry.
AI Video Synthesis offers amazing benefits such as increased efficiency, enhanced creativity, accessibility, time and cost savings and many more. The fast emerging new tools are surely going to democratize filmmaking by making advanced visual effects more accessible to independent creators with limited resources.
Video synthesis has the potential to shakeup the entire movie production and animation industry, and in this Video, we will share one such tool, PixVerse. As most other tools in this category, PixVerse is right now sort of in its infancy but it is promising and is absolutely free to use. This means you can use it and experience and learn the Art of Video Synthesis Prompting and gain valuable insight into the world of AI Video Synthesis.
We think it would be fair to say that other tools such as PikaLabs, Genmo, RunwayML, LensGo and a few others are equally capable of creating stunning videos from scratch as well but their use may not be as liberal as PixVerse.
So Let us see how you can create some stunning, dynamic, beautiful and mesmerizing videos with PixVerse. First weâll share some examples before we delve into the features of the tool.
In this example, you can you can see that the hair is well animated, how well it handles individual strains of hair billowing in the Wind.
This example shows us boats in Venice moving very smoothly through the ocean waters.
Here you can see some almost photorealistic flowers gently swaying in the breeze.
PixVerse can add subtle details to already beautiful images if an image lends itself effectively to animation and that is an absolute game-changer.
Here is an example of people working in eighteenth century industrial age factory and then an eighteenth century classroom.
Here is an example of an animated Pixar style image. Notice how the eyebrows relate to the moving eyes. It's not that the eyes are blinking and the eyebrows are static, but the eyebrows are moving in unison with the eyes.
Here you can see an image where there are static parts and moving parts.
Here again is a an excellent example of a beautiful image of this individual slowly moving away from camera and AI has managed to work out how the different objects in the scene are likely to be animated.
On their website there are more examples, such as a parallax animation of panning around an object centering it as the focal point.
The objects closer to the camera and the objects that are far behind move at different speeds creating a parallax effect.
You can view the depth and perspective of objects with this effect. The objects that are closer to the camera move faster than the objects that are far-away from the camera. This relationship between these objects feels very natural. It is not distorting the perspective or the relationship of each of the objects. It feels that these objects are on a consistent plane, which means that the spacing between them is not being changed.
Now letâs create a video using PixVerse since we have an idea about its capabilities.
Go to their website, Pixverse.ai. And you can log in using a Google account.
To create your Video, we have two options; the first is âtext to videoâ, which is where you enter a prompt and get a video as output.
The second is âimage to videoâ option, where you can upload any image, even a photograph, and create a video from that image which acts as a seed (or guide).
Letâs try âimage to videoâ first.
Select an image of a person and upload it. Keep one thing in mind that the image file is a .png or .jpg file format. Your image does not have to be in a super high resolution as PixVerse downscales your image to around 1,000 pixels. But PixVerse can accept an image up to 10 megabytes in size.
So once weâve uploaded an image, you can add some simple elements in the text box to include in the synthesized video such as; âwind blowing in the hairâ and âa panning shotâ.
Next, youâll need to tweak a couple of parameters. The first is the Seed parameter and that essentially defines the starting point of Randomness for your video.
Another parameter you need to set is the âstrength of motionâ.
Essentially here you define how much motion there will be in the image (or the output Video). The value for this parameter can be from 0.01 to all the way up to 1.0.
Note that if you set this value all the way down to 0.01, you will have no movement in the video or just negligible movement. So, set it somewhere in the middle like 0.50 and see the results.
You can experiment with different values for the âSeedâ and âStrength of Motionâ to get an idea of how the videos turn out with these parameters.
PixVerse has a very simple pipeline for creating Videos which is highly preferred in an AI Art generator. It is very likely that they are going to advance this interface with time and start to add more features where you can select what parts of the image are being animated.
Runway, a competitor of PixVerse already offers this feature with their âmotion brushâ so we can expect PixVerse to catch up and offer this feature as well. And, more specifically, actually refine other Parameters related to Motion, Subject and Style.
While on the subject of Pipelines in the context of Video Synthesis and production tools, let us explain what a Pipeline is.
As mentioned earlier, almost all Video Synthesis tools have a Text to Video feature. This is where you will use prompt engineering and create a meaningful prompt â something that an AI Generative Video model will understand. Prompt engineering skill is something you must practice and eventually master as this is the âLanguage that AI Generative Models Understandâ. This is their native language in a way.
Text to Video prompts are very similar to âText to Imageâ prompts, with one major difference and that is defining âMotionâ, which is not required when generating Images.
There are five parts to an effective Video Synthesis prompt.
1. Subject
2. Landscape and Features
3. Composition & Relationship
4. Style
5. Motion
Letâs examine each of these so you can understand some of the options you have when creating Text to Video prompts that will include all five parts, that is Subject, Landscape and Features, Composition & Relationship, Style and Motion.
1. SUBJECT
This is the main part of the prompt. âWhat is it that we want in the Video Frame? For example you can have People, Animals and Objects.
People obviously means that you can have all ranges of humans from young to old, male female, and in an entire space of different situations.
Animals is used broadly to include birds, fish, reptiles or any other living creature.
And, Objects are like cars, motorbikes, a Waterwheel, Windmill, etc.
So these are just examples of âSubjectsâ in a prompt.
2. LANDSCAPE & FEATURES
Subjects are placed in a landscape that may have features that you need to include. For example, is it a road in the mountains with a bridge, next to a waterfall, on a cloudy and windy day etc.?
3. COMPOSITION & RELATIONSHIP
Composition and Relationship is when you combine the Subject and the Landscape elements. You can consider how these set up. How are they positioned in the Landscape? What is their relationship either to each other or to the viewer? For example is it a very close-up shot. Is the subject far away? Is it walking, standing, sitting, stationary, etc. and are they relating to an object in any way.
So this is the simplest conceptual framework for understanding composition and relationship.
4. Style
This is essentially how something looks. Itâs not what it is. Itâs how it is portrayed. One of the most popular interpretations of this is of course âRealisticâ. Generally most of us like to see a realistic reflection of an object or human but not always. âRealisticâ happens to be the most popular Style that people use, especially when starting out.
Other styles you can experiment with include Cinematic, Vintage, Anime, Pixar, Cartoon, Media (painted, sculpted, etc.)
For example, if you use âCinematicâ, you are adding a more cinematic, dramatic narrative on top of the image. Usually this involves lighting and the scene is made more alluring with the shot either at Dawn or at Dusk for example.
Cinematic essentially maintains a realistic look but adds a more stylistic tone. In your prompts, try experimenting with terms from the English dictionary like mesmerizing, captivating, enthralling, fascinating, spellbinding, hypnotic, entrancing bewitching, enchanting, riveting, alluring, etc. but be careful not to over-use them. Stick to no more than two at a time.
You can also try variations like a Vintage style for example. Beyond this, we can also imagine Animated Styles such as Anime or Pixar Style, and also try using âMediaâ. Media here means how is this created? What camera is being used if you are getting very specific with a Realistic approach. Or, you can even imagine if this is generated using different types of artistic mediums, for example painted, sculpted, drawn or whatever you like.
5. MOTION
Motion is the most crucial part in an AI Video Synthesis prompt. This is one of the keys to working with AI Video Synthesis. It is understanding what is being communicated within the image in relation to potential movement and animation.
There are two things to consider when injecting Motion into the prompt.
First, consider how the Subject is moving. Basically it is the verbs that you can use to describe the subject's movement. Are they running, walking, sitting, talking, swaying, dancing, etc. There are a lot more verbs that you can get from the English dictionary.
The other thing to consider, which is not how the subject is moving, but how the viewer is moving (that is how the camera is moving). This is an important concept to consider in your AI video, because it is how the viewer will perceive the experience.
You must learn to imagine that you're leading the viewer on a journey and you're placing them at the same position as the cameraman. This is the point of view that you can envisage.
So, you can consider either having a Static shot, which is where you're sitting, you're looking and everything that is happening inside of the shot. Or, you can consider that the camera is panning around, zooming in, zooming out, whether the camera is on a dolly and smoothly and gracefully tracking the subject, or perhaps it is flying over the top of the scene in a helicopter.
When defining MOTION, you need to combine the Camera terms (specifically Static, Panning, Zooming, Dolly etc.) and Subject Verbs (running, walking, sitting, dancing etc.) in the image and define the motion within it. You need to clearly articulate the direction and the nature of movement of different subjects and objects. Be forewarned that this is an incredibly difficult art, but it is not something that you wonât be able to master if you keep practicing.
In the world before AI Video Synthesis, learning this art took years of experience in the field of videography and movie making. Now it is as simple as trying out different iterations on your computers. THIS is mainly why we believe that AI Video Synthesis is going to eventually totally disrupt Hollywood in the next decade if not sooner.
Another important thing to remember when prompting is that too many small details in the prompt may overwhelm the system. So try to keep things brief, specific and to the point.
Also keep in mind is that Video Synthesis is extremely expensive because it is very GPU intensive. This means that you wonât get more than a 4 second scene from any AI Video Synthesis tool.
You do have the option to extend time by changing the speed at which the video is played back, but it can actually give you some varying effects as well â such as a feeling of slow motion once you slow the Video down. So you can actually expand the time out to 8 seconds by playing the Video at half speed. Again, this can work very well for a few different styles of video but for others it may not.
One advantage that PixVerse has over other AI tools is that you can upscale your output videos to a higher resolution of up to 4K, while other tools may require third party utilities such as âTopazâ or âPixopâ which are paid services. The upscaler at PixVerse actually compares extremely well to these other paid options in terms of quality.
So, to upscale your Video resolution, just click on âUpscaleâ next to the image of the Video, wait for Upscaling to finish and then download the Video.
PixVerse also has the option to âretryâ which reruns the image with all of the same prompts and settings input. This is because each time you run it, you will get a different output for the same prompt.
When you use the Text to Video version, you simply enter in a prompt. You can add in negative prompts, which are the elements that you wish to discard inside of your work. These are things you do not wish to include.
After entering the prompt, you can tweak the Seed, the Aspect Ratio and the Strength of Motion. When you specify the Aspect Ratio, you are specifying the screen size of the output video. Most AI Video Synthesis allow Horizontal landscape video, a vertical portrait video, a square Video or a slightly rectangular Video which is 4 by 3 aspect ratio.
The Video generation time varies but generally is around about 30-60 seconds. It all depends on how busy the GPUâ servers are. As mentioned earlier, AI Video Synthesis is an extremely GPU intensive process and most companies donât have thousands of GPU clusters - at least not just yet. Well, we think Apple, Meta, Google, NVidia and few others do have massive GPU Clusters (some containing thousands of GPUâs) but these companies have not rolled out their Video Synthesis tools so far even though they have announced Video Synthesis models behind the scenes.
And when they finally do, the landscape for Video Synthesis might change in terms of Video output Quality, Length of Synthesized Video etc.
One question that pops up frequently is that whether you can use the synthesized videos for commercial purposes. This is something that you must check for yourself to be on the safe side, so please confirm their terms of service before you use these tools for commercial use.
Finally, what can we expect in the future from PixVerse? First, there is no doubt it is not going to be free forever? Itâs important that you take a look at it now because you can create as many videos as you like all the way up to 4K for free, and, importantly also become proficient in AI Video Synthesis concepts.
As mentioned earlier, most of AI Synthesis tools have similar features and have fairly standard pipelines.
What can we expect from the competition? The competition in the AI Video Synthesis domain is just beginning to heat up. Here is just a heads-up on some competitors â all vying for a share of the Global film, TV and Animation trillion dollar Industry;
PikaLabs, RunwayML, Genmo, Kaiber.ai, LensGo Ai, Prome-Ai, Moon Valley, and, of course some Mega players like Google with their MAGVIT, Video Poet, Lumiere and Imagen, Meta, the parent company of Facebook and Instagram with their Make-A-Video, and Emu Video tools, Apple with their Matr-yoshka Diffusion Models (or MDM) that is able to create longer videos.
And finally, NVidia and Amazon are collaborating on their own AI Video Synthesis tools â and we know that those tools are not going to be ordinary due to the fact that NVidia has a somewhat monopoly on GPUâs, and, Amazon has the Cloud Platform necessary to scale their Generative Video Services.
For the general public, the best is yet to come in AI Video Synthesis!
Please Subscribe to our Channel and Stay tuned so we can keep you informed on all these fascinating developments on the horizon.
So Thanks for watching and donât forget to press the like button.
Last updated