Meta has introduced its latest innovation, Meta Movie Gen, a generative artificial intelligence (AI) research model designed to enhance media creation.
The model supports various creative formats, including images, videos, and audio, allowing users to generate custom media using simple text inputs. Meta claimed that Movie Gen not only produces high-quality videos but also excels in sound creation, personalisation, and editing, outperforming existing models in the industry when assessed by human evaluators.
Meta releases model to rival OpenAI’s Sora
The social technology company’s journey into generative AI began with the development of the Make-A-Scene models, which allowed for the generation of images, audio, video, and 3D animations. Meta subsequently launched Llama Image foundation models, which enabled greater precision in generating images and videos. Meta Movie Gen is said to represent the latest phase in this progression, combining multiple modalities and offering finer control to users.
Meta emphasised that while the potential applications for generative AI models are vast, they are not intended to replace artists or animators. Instead, the company seeks to empower more people to express their creativity in new and innovative ways. The Facebook parent company envisions a future where anyone can use this technology to bring their artistic ideas to life, crafting professional-quality videos and audio with minimal effort.
Meta Movie Gen has four primary capabilities: video generation; personalised video creation; precise video editing; and audio generation. These capabilities have been developed using a mix of licensed and publicly available datasets, which were employed to train the models, said the firm.
The video generation capability allows users to input a simple text prompt, and the system will generate high-definition videos and images. This feature relies on a 30 billion parameter transformer model optimised for both text-to-image and text-to-video generation. Meta claims that the model can generate videos up to 16 seconds long at a rate of 16 frames per second, and can accurately simulate object movements, interactions between subjects, and camera motions, said the company.
Meta also expanded this technology to support personalised video creation. By using an image of a person along with a text prompt, the system can generate videos that retain the likeness of the individual while incorporating detailed visual elements described by the text. According to Meta, the model excels at preserving human identity and motion, achieving superior results in personalised video production.
Meta claims that Movie Gen’s video editing functionality offers users a precise editing tool using both video and text as inputs. This capability allows users to perform detailed edits, such as adding or removing elements from the video or changing backgrounds and styles. Meta explained that unlike traditional video editing software, which may require technical expertise, Movie Gen can edit specific parts of a video without affecting the original content.
Additionally, Meta introduced an advanced audio generation model as part of the suite, featuring 13 billion parameters. This model can generate audio tracks, including ambient sounds, sound effects, and instrumental background music, that align perfectly with the visual content. The system can produce audio segments up to 45 seconds long and is capable of generating coherent audio for videos of arbitrary lengths.
“As we continue to improve our models and move toward a potential future release, we’ll work closely with filmmakers and creators to integrate their feedback,” said Meta. “By taking a collaborative approach, we want to ensure we’re creating tools that help people enhance their inherent creativity in new ways they may have never dreamed would be possible.”
However, there are limitations to the Meta Movie Gen AI model. The firm’s chief product officer, Chris Cox, mentioned that while the model produces impressive results, it is currently too costly to operate at scale, and the time required to generate these videos remains prohibitive. There are also some issues with precision, for instance, during a demonstration to the New York Times, the AI mistakenly rendered a human hand instead of a dog’s paw when generating a video.