ChatGPT update: OpenAI chatbot can 'see, hear and speak'

OpenAI has revealed a major new update of ChatGPT, giving the AI chatbot additional voice and image capabilities. It means users will be able to communicate with the chatbot using their voices and show it images as part of a question.

ChatGPT is getting a major update. (Photo by Vitor Miranda/Shutterstock)

The update was revealed by the AI lab earlier today. They will initially be rolled out to users of the company’s ChatGPT Enterprise and Plus subscribers over the next two weeks, with access for other user groups to follow.

ChatGPT sparked an artificial intelligence boom when it launched last year, with millions of people signing up to use the technology. Though its popularity has subsequently dipped, OpenAI is trying to cash in on its success and recently launched its enterprise tier for business users. But it operates in what is an increasingly crowded market, with the likes of Google’s Bard and Anthropic’s Claude offering alternatives to companies.

ChatGPT update mimics Alexa and Siri

The voice function will mean users can ask ChatGPT questions via a microphone, in a similar way to that which you would question Amazon’s personal assistant, Alexa, or Apple’s Siri. Users have to enable the functionality, which also allows ChatGPT to respond using an AI-generated voice.

“The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech,” OpenAI’s blog post on the update says. “We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text.”

The company says its new voice technology “opens doors to many creative and accessibility-focused applications”. But, it warns, “these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.” It says it is only using the technology for this specific use case, but today it was announced that music streaming service Spotify is deploying it to translate podcasts into additional languages, using AI-generated voices which sound like those of the presenters.

OpenAI puts ChatGPT in the picture

Elsewhere, users can input one or more images as part of their query. OpenAI says this will enable you to “troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal or analyse a complex graph for work-related data”.

The system’s ability to understand images is powered by the company’s multimodal large language models, GPT-3.5 and GPT-4. The company admits this could create safety issues for users. Vision-based models “present new challenges, ranging from hallucinations about people to relying on the model’s interpretation of images in high-stakes domains”, the blog post says. “Prior to broader deployment, we tested the model with red teamers for risk in domains such as extremism and scientific proficiency and a diverse set of alpha testers.

“Our research enabled us to align on a few key details for responsible usage.”

It adds that it has taken “technical measures” to significantly limit ChatGPT’s ability to analyse and make direct statements about people “since ChatGPT is not always accurate and these systems should respect individuals’ privacy.”