OpenAI has unveiled its next-generation text-to-image model, DALL-E 3. It will initially only be available to customers with a ChatGPT Pro and Enterprise subscription and can be accessed through prompts in the chatbot interface. The launch comes as Amazon unveiled a new generative AI-powered version of its voice assistant, Alexa.

OpenAI says its new DALL-E text-to-image model has improved text capabilities within an image (Photo: OpenAI/DALL-E class=
OpenAI says its new DALL-E text-to-image model has improved text capabilities within an image. (Image courtesy of OpenAI)

The first version of DALL-E was launched in January 2021 by OpenAI and is built on top of a modified version of the GPT-3 transformer model that powered the first release of ChatGPT. Version two was given a wide public release in September last year with improved resolution and image clarity. It also more accurately reflected the text prompt.

DALL-E 3 is the first version to be integrated into ChatGPT, rather than being available as a stand-alone service or through an API. OpenAI says it can understand more nuance and detail than previous versions, and create “exceptionally accurate images”.

“When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life,” OpenAI said. “If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.”

The latest version of DALL-E comes as OpenAI is facing increasing competition from other image generation tools like Stable Diffusion from Stability, Firefly from Adobe and Midjourney. There are also new tools like Ideogram, specifically focused on improved text clarity inside images, and Runway, for video generation.

In a bid to combat growing concerns over copyright infringement, OpenAI has also specifically instructed this version of the model not to generate any work in the style of a living artist. Creators will be able to opt out of having their work used to train future models. “We improved safety performance in risk areas like generation of public figures and harmful biases related to visual over/under-representation,” OpenAI said.

Efforts to improve representation and remove content inspired by living artists were made in partnership with a new ‘red team’ – domain experts who stress-tested the model and created reports to inform risk mitigation and assessment efforts.

“We understand that some content owners may not want their publicly available works used to help teach our models,” the AI lab explained on a form allowing artists to opt out of work being used in future model training. The easiest solution, according to OpenAI is to disallow its web crawler GPTBot, but artists can also submit specific images for removal.

DALL-E 3 is currently in a research preview for selected testers, focused on safety and clarity, according to OpenAI. The company says it will launch for ChatGPT for Pro and Enterprise users next month and will be available via API later this year.

Amazon gives Alexa a generative AI ‘brain’

OpenAI wasn’t the only company announcing a new model. Amazon confirmed it was putting more brain power behind its popular voice assistant, Alexa. Nearly a decade after its launch, Alexa will now be powered by a large language model (LLM).

The custom-built model is designed to give Alexa more human-like conversational qualities including making it more casual in tone and answer a wider variety of questions. It will be available as a preview in the US “soon” with no date for the rest of the world.

During the announcement, Amazon declared its intention to outperform platforms like ChatGPT by integrating “real-time info” into the Alexa LLM. ChatGPT’s information is only accurate up to 2021 when the underlying model finished training. For real-time information, additional plugins are needed.

“To be truly useful, Alexa has to be able to take action in the real world, which has been one of the unsolved challenges with LLMs – how to integrate APIs at scale and reliably invoke them to take the right actions,” Amazon said in a statement. “This new Alexa LLM will be connected to hundreds of thousands of real-world devices and services via APIs. It also enhances Alexa’s ability to process nuance and ambiguity—much like a person would—and intelligently take action.”

Read more: Intel pitches AI PCs, but is it already late to the party?