OpenAI has announced its creation of Sora, an AI platform capable of creating realistic videos from simple text prompts. In a series of video clips posted on its blog, the firm revealed that Sora’s outputs can include movie-like scenes replete with realistic backgrounds, human-like characters and accurate detailing. For now, Sora is being released among a select group of filmmakers, visual artists and designers, as well as so-called ‘red teams’ of researchers tasked with finding flaws in the software or defining scenarios where it may be misused.
The new model, said OpenAI, “has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.”
Sora produces photorealistic video footage
The examples shared by OpenAI, which include artificial footage of an SUV driving down a dirt road, a shot of a makeshift town in California during its 19th-century Gold Rush, and a cat waking up its owner in bed, are some of Sora’s most photorealistic outputs. Though impressive, similar demonstrations of text-based predecessors have often been marred by flawed outputs documented by users upon the model’s wider release – see, for example, the incoherent responses of GPT-3 after OpenAI confidently predicted it was capable of writing news articles indistinguishable from those authored by a human.
OpenAI seems to have anticipated this criticism early, acknowledging that Sora struggles with reproducing physics in certain scenes or comprehending cause and effect, like a character taking a bite out of a cookie only for the snack to remain unblemished by any bite marks. “The model may also confuse spatial details of a prompt,” said the firm, “and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”
Safety a major concern for new OpenAI model
The obvious safety implications of releasing a photorealistic video generation platform into society were also addressed by OpenAI. Sora is currently on limited release to a handful of visual artists and developers, said the company, as well as ‘red teamers,’ defined as “domain experts in areas like misinformation, hateful content, and bias” who will adversarially test the model. OpenAI added that it is working on a detection classifier capable of identifying when a video has been generated using Sora, and confirmed plans to include C2PA metadata in the product should it be released to the wider public.
Sora’s release adds to a busy month of generative AI announcements from big tech firms. This includes Google’s rebranding of its Bard chatbot to simply ‘Gemini,’ in addition to the launch of its new and more powerful Gemini 1.5 large language model. Apple, too, announced its development of a new code-generation product to compete against Microsoft’s popular Copilot software.