View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data
September 9, 2016

Google’s DeepMind has learnt how to talk like a human

Artificial Intelligence learns a new skill.

By James Nunns

Anyone that might be concerned about computers taking over look away now, because they are a step closer to sounding just like humans.

Researchers in the UK at Google’s DeepMind unit have been working on making computer-generated speech sound as “natural” as humans.

The technology, called WaveNet, which is focused on the area of speech synthesis, or text-to-speech, was found to sound more natural than any of Google’s products.

However, this was only achieved after the WaveNet artificial neural network was trained to produce English and Chinese speech which required copious amounts of computing power, so the technology probably won’t be hitting the mainstream any time soon.

Using a convolutional neural network, which is used for artificial intelligence in deep learning, it is trained on data and then the systems make inferences about new data, in addition to being used to generate new data.

Training WaveNet required Google’s North American English and Mandarin TTS data from professional female speakers. Once it was trained it was put into competition against a parametric system that relies on a hidden Markov model and a concatenative system, this relies on a long short-term memory recurrent neural network, the research said.

WaveNet was found to have performed “significantly better” than the other system, but it was not felt to be more human than actual human recordings, yet.

Content from our partners
Green for go: Transforming trade in the UK
Manufacturers are switching to personalised customer experience amid fierce competition
How many ends in end-to-end service orchestration?

Really the underlying problem is how much data is being created by the need to take at least 16,000 samples of waveforms a second, currently this makes its mass production unlikely.

Human speech isn’t the only thing that the researchers have been testing the technology on, it was also trained on solo piano music on YouTube to produce new music.

Examples of the music and of the speech tests can be found here. The speech results do sound suspiciously human and the music it created is certainly worth a listen.

The researchers wrote: "WaveNets open up a lot of possibilities for TTS, music generation and audio modelling in general. The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next.”

The paper can be found here.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU