View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. AI and automation
November 25, 2016updated 20 Jul 2022 11:42am

Google’s DeepMind AI masters lip-reading

DeepMind AI created lip-reading software more advanced than professional lip reading.


Researchers from Google’s DeepMind and the University of Oxford have collaborated to create a highly accurate lip-reading software using artificial intelligence.

The AI system, which was trained using almost 5000 hours of TV footage from the BBC, contained a total of 118,000 sentences from the videos.

The key contributions detailed in the report included a ‘Watch, Listen, Attend and Spell’ (WLAS) network structure, which learns to transcribe videos of mouth motion to characters.

In precise explanation of its research, DeepMind researchers explained that the aim of the study “is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

“Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-word problem- unconstrained natural language sentences, and in the wild videos.”

It was found that the AI was trained on shows which aired during the period between January 2010 and December 2015, and later tested its performance on programmes broadcast between March and September 2016.

The performance of the system was compared to that of humans, whereby a professional lip reading company was instructed to decipher a random sample of 200 videos.

It was identified that the professional lip reader was only able to decipher less than one-quarter of the spoken words, whilst the WLAS model was able to decipher half of the spoken words.

Content from our partners
Why all businesses must democratise data analytics
Unlocking the value of artificial intelligence and machine learning
Behind the priorities of tech and cybersecurity leaders

In an interview with News Scientist, Ziheng Zhou at University of Oulu, Finland said: “It’s a big step for developing fully automatic lip-reading systems. Without the huge data et, it’s very difficult for us to verify new technologies like deep learning.”

DeepMind researchers believe that the program could include a host of applications, such as assisting hearing-impaired people to understand conversations.

It may also be used to annotate silent films and assist with the control or digital assistants like Siri or Amazon’s Alexa.

Websites in our network
NEWSLETTER Sign up Tick the boxes of the newsletters you would like to receive. Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
I consent to New Statesman Media Group collecting my details provided via this form in accordance with the Privacy Policy