View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
February 24, 1987

IBM CLOSE TO MACHINE RECOGNITION OF CONTINUOUS SPEECH

By CBR Staff Writer

IBM is claiming a great leap forward in computer recognition of human speech after a successful demonstration of a 20,000-word vocabulary for the experimental system at the T J Watson Research Center in Yorktown Heights, New York. The company says that the achievement of 20,000 words in a desk-top system marks another leap forward for the project, which it claims as the world’s most advanced. The 20,000-word vocabulary includes 97% of all the words a speaker is likely to use in business. Speech uttered into a small microphone, with brief pauses between words, appears almost instantly on the screen. Documents can then be edited either by voice or keyboard, stored, printed or transmitted. It requires only 20 minutes of training to the individual user’s voice, during which he or she must read a special document that is used by the system to characterise and store the individual’s unique way of speaking. Trials of the system are planned for IBM offices as IBM speeds up development of a system to recognise continuous speech without pauses between words. The IBM approach to speech recognition is claimed to be unique, and is based on two statistical models. The first comes from the speaker’s training session in which 200 sound patterns that characterise the speaker are established. A selection of candidate words, drawn from the 20,000-word vocabulary and described by those sound patterns, results. The candidate words are then matched against the second model using a database of 25m words of IBM office correspondence. The number of candidates is thereby reduced by determining which are most likely to follow the two previous words. The system then makes its final selection of the best word after it has determined that analysis of subsequent words won’t affect the choice. This contextual ability enables the system to distinguish between homonyms – to, two, too. The Personal Computer-based system uses two high-speed subsystems, each using an IBM signal-processor chip; the first transforms a speaker’s words into labels to encode the speech, the second does the pattern matching.

Websites in our network
NEWSLETTER Sign up Tick the boxes of the newsletters you would like to receive. Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
I consent to New Statesman Media Group collecting my details provided via this form in accordance with the Privacy Policy
SUBSCRIBED
THANK YOU