View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. AI and automation
November 25, 2016updated 20 Jul 2022 11:42am

Google’s DeepMind AI masters lip-reading

DeepMind AI created lip-reading software more advanced than professional lip reading.

By Hannah Williams

Researchers from Google’s DeepMind and the University of Oxford have collaborated to create a highly accurate lip-reading software using artificial intelligence.

The AI system, which was trained using almost 5000 hours of TV footage from the BBC, contained a total of 118,000 sentences from the videos.

The key contributions detailed in the report included a ‘Watch, Listen, Attend and Spell’ (WLAS) network structure, which learns to transcribe videos of mouth motion to characters.

In precise explanation of its research, DeepMind researchers explained that the aim of the study “is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

“Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-word problem- unconstrained natural language sentences, and in the wild videos.”

It was found that the AI was trained on shows which aired during the period between January 2010 and December 2015, and later tested its performance on programmes broadcast between March and September 2016.

The performance of the system was compared to that of humans, whereby a professional lip reading company was instructed to decipher a random sample of 200 videos.

Content from our partners
<strong>Powering AI’s potential: turning promise into reality</strong>
Unlocking growth through hybrid cloud: 5 key takeaways
How businesses can safeguard themselves on the cyber frontline

It was identified that the professional lip reader was only able to decipher less than one-quarter of the spoken words, whilst the WLAS model was able to decipher half of the spoken words.

In an interview with News Scientist, Ziheng Zhou at University of Oulu, Finland said: “It’s a big step for developing fully automatic lip-reading systems. Without the huge data et, it’s very difficult for us to verify new technologies like deep learning.”

DeepMind researchers believe that the program could include a host of applications, such as assisting hearing-impaired people to understand conversations.

It may also be used to annotate silent films and assist with the control or digital assistants like Siri or Amazon’s Alexa.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.