View all newsletters
Receive our newsletter - data, insights and analysis delivered to you

Mozilla’s “Common Voice” Project Goes Multilingual

Comes amid ongoing efforts by Mozilla to develop an open source speech-to-text engine

By CBR Staff Writer

Mozilla’s ‘“Common Voice” project, which asks users to donate their voices in order to create a bank of speech data to run machine learning algorithms, has gone multilingual.

It is now accepting donations in German, French and Welsh, with acceptance of speech samples in languages ranging from Cornish to Tamil, Uzbek to Sakha all pending.

The open source giant wants the project to be “a tool for any community to make speech technology available in their own language”.

Multiplicity of Data + Machine Learning = Better Speech Technology

Mozilla is building out a bank of data by asking users from around the globe to donate their voices via their voice contribution platform. The firm knows that the more speech data they have, the more sophisticated speech-powered applications can be build.

The team, which is rumoured to be working on a speech-powered browser, said on its flagship site: “We believe that large and publicly available voice datasets foster innovation and healthy commercial competition in machine-learning based speech technology”.

The Innovation Penalty

The Common Voice project comes as it gets ever simpler to create production-quality speech-to-text (STT) and text-to-speech (TTS) engines.

As Mozilla’s Kelly Davis put it in an earlier blog, powerful tools like artificial intelligence and machine learning, combined with today’s more advanced speech algorithms, have changed our traditional approach to development.

Content from our partners
Unlocking growth through hybrid cloud: 5 key takeaways
How businesses can safeguard themselves on the cyber frontline
How hackers’ tactics are evolving in an increasingly complex landscape

“Programmers no longer need to build phoneme dictionaries or hand-design processing pipelines or custom components. Instead, speech engines can use deep learning techniques to handle varied speech patterns, accents and background noise – and deliver better-than-ever accuracy.”

Yet as Davis emphasised, there are barriers to innovation in the sector; developers who want to implement STT on the web are working with a fractured set of APIs and support. Creating a speech interface for a web application that works across all browsers either requires developers to write code that works across discrete browser APIs (they are starkly different for Chrome, Safari etc.)

Alternatively they can purchase access to a non-browser-based API from Google, IBM or Nuance. Davis notes: “Fees for this can cost roughly one cent per invocation. If you go this route, then you get one stable API to write to. But at one cent per utterance, those fees can add up quickly, especially if your app is wildly popular and millions of people want to use it. This option has a success penalty built into it, so it’s not a solid foundation for any business that wants to grow and scale.”

Why do They Need Donated Voice Samples Again?

That is the context for Mozilla’s efforts to develop an open source STT engine, which will give the ability to utilise STT in the Firefox browser, and hand over the toolkit to the speech developer community, with no access or usage fees. But, why the voice samples?

Language is incredibly complex—people ask about something as simple as the weather in over 10,000 ways (“our favourite: ‘Will it be cats and dogs today?’”) as Google noted in a blog published earlier this year, as it stunned observers with the capabilities of its AI voice assistant, Duplex. This has been programmed to match expectations around latency – and sounds eerily human.

Those wanting to democratise this process and gain access to similar skillsets could do worse than visit https://voice.mozilla.org/en/speak and read out one of the samples, which include: “Vermicelli A trio, or musical piece for three voices or instruments.”

Hopefully that trips off the tongue.

 

 

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU