View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. AI and automation
September 25, 2023updated 26 Sep 2023 9:34am

ChatGPT update will help OpenAI’s chatbot ‘see, hear and speak’

Users will be able to ask questions using their microphones and submit images as part of questions.

By Matthew Gooding

OpenAI has revealed a major new update of ChatGPT, giving the AI chatbot additional voice and image capabilities. It means users will be able to communicate with the chatbot using their voices and show it images as part of a question.

ChatGPT is getting a major update. (Photo by Vitor Miranda/Shutterstock)

The update was revealed by the AI lab earlier today. They will initially be rolled out to users of the company’s ChatGPT Enterprise and Plus subscribers over the next two weeks, with access for other user groups to follow.

ChatGPT sparked an artificial intelligence boom when it launched last year, with millions of people signing up to use the technology. Though its popularity has subsequently dipped, OpenAI is trying to cash in on its success and recently launched its enterprise tier for business users. But it operates in what is an increasingly crowded market, with the likes of Google’s Bard and Anthropic’s Claude offering alternatives to companies.

ChatGPT update mimics Alexa and Siri

The voice function will mean users can ask ChatGPT questions via a microphone, in a similar way to that which you would question Amazon’s personal assistant, Alexa, or Apple’s Siri. Users have to enable the functionality, which also allows ChatGPT to respond using an AI-generated voice.

“The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech,” OpenAI’s blog post on the update says. “We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text.”

The company says its new voice technology “opens doors to many creative and accessibility-focused applications”. But, it warns, “these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.” It says it is only using the technology for this specific use case, but today it was announced that music streaming service Spotify is deploying it to translate podcasts into additional languages, using AI-generated voices which sound like those of the presenters.

OpenAI puts ChatGPT in the picture

Elsewhere, users can input one or more images as part of their query. OpenAI says this will enable you to “troubleshoot why your grill won’t start, explore the contents of your fridge to plan a meal or analyse a complex graph for work-related data”.

Content from our partners
Rethinking cloud: challenging assumptions, learning lessons
DTX Manchester welcomes leading tech talent from across the region and beyond
The hidden complexities of deploying AI in your business

The system’s ability to understand images is powered by the company’s multimodal large language models, GPT-3.5 and GPT-4. The company admits this could create safety issues for users. Vision-based models “present new challenges, ranging from hallucinations about people to relying on the model’s interpretation of images in high-stakes domains”, the blog post says. “Prior to broader deployment, we tested the model with red teamers for risk in domains such as extremism and scientific proficiency and a diverse set of alpha testers.

“Our research enabled us to align on a few key details for responsible usage.”

It adds that it has taken “technical measures” to significantly limit ChatGPT’s ability to analyse and make direct statements about people “since ChatGPT is not always accurate and these systems should respect individuals’ privacy.”

Read more: Salesforce wants to be the trusted face of enterprise AI

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.