Over the next five to 10 years, voice recognition is expected to make the breakthrough from niche applications to the mass market. Advances in microprocessor and software technology and lower prices mean that it is becoming feasible to add speech processing and recognition features to a wide range of electronic equipment. The biggest initial markets for voice recognition will be in telephone switchboards, call centers, mobile phones and business PCs. According to US speech recognition consultancy TMA Associates, worldwide speech recognition licensing revenues will grow from $1.93bn in 1997 to $7.88bn in 2000, while the number of speech recognition units sold will increase from 54 million to 428 million in the same period. Since the mid-1990s, telephone call center and switchboard (PBX) manufacturers have incorporated facilities such as interactive voice response (IVR), voice-mail and voice-activated dialing in their systems.

Automatic applications

Traditional IVR systems require callers to key in a response to voice prompts from their touch-tone phones, but the latest generation use voice recognition technology to accept verbal responses. Switchboards are now available from companies such as Vocalis, the Cambridge-based speech recognition specialist which use speech recognition to direct calls to a named person. Voice recognition technology is also used in call centers to automate applications such as telephone banking, directory enquiries, holiday booking and ticket ordering. Since 1995, Abbey National, the UK financial services group, has offered its telephone banking customers an interactive voice response system using Vocalis’ speech recognition software. The first generation of speech recognition systems were unable to cope with continuous speech and required users to leave a ten-second gap between each word. Such discrete systems are now very effective, with practiced users achieving up to 98% accuracy with them. But since much of the point of a speech interface is to create a more natural method of working with a computer, discrete voice recognition – however successful – was never likely to become more than a niche market. Now vendors such as IBM, Dragon Systems and Philips have brought out systems which are able to recognize continuous speech and can store large vocabularies. From the user’s perspective, the move from discrete to continuous is a major breakthrough. Applications such as speech-to-text – for example, dictating letters or email messages to a PC – which improve personal productivity are expected to be particularly popular in the business market. Continuous speech recognition packages such as IBM’s ViaVoice Gold dictation and voice command system are now available for under $200.

Rapid progression

At the same time, natural language technology is progressing rapidly. Natural language is the capability of a computer to decipher the meaning in ordinary, everyday speech, rather than requiring users to speak in prescribed patterns. A very limited application of the technology, which relies on the computer to decipher meaning from keywords, allows users of IBM’s ViaVoice Gold package to format Word documents. 1998 is going to be the year when the flood gate opens, says Ken Landoline of US market research company Giga Information Group. Within the next few years, speech input will become commonplace. By 2001 we’ll see around 30% of PC users using speech recognition for some aspect of their daily work. Speech recognition is hot, says William Meisel, president of TMA Associates. Nobody can say any longer that it doesn’t work. Mr Meisel predicts that the market will surge from tens of thousands of speech-recognition-enabled telephone lines today to hundreds of thousands of lines by the end of 1998. Voice recognition is now attracting major interest from global high-tech companies such as IBM, Philips, Northern Telecom, Ericsson and Microsoft. Ericsson signed agreements last October with Vocalis and with Brite Voice Systems, the US call center developer, to integrate voice-activated dialing into its GSM digital mobile network offerings. This will enable mobile phone subscribers to give verbal instructions such as phone home or call boss to their handsets. Ericsson believes that the European and Far East market for mobile voice-activated dialing will be worth over $500m by 2002. There are already mobile phones on the market which can store or recognize a limited number of words. For example, Northern Telecom and Matra offer a mobile phone which can store 30 numbers. Uniden, a leading Japanese telephone handset maker, is about to market a $150 phone through US retail chains such as Sears, Roebuck which will respond to spoken names instead of buttons. The company claims that the telephone works about 19 out of 20 times; when it misses, it asks the caller to speak closer to the mouthpiece and repeat the name. Nevertheless, network-based voice-dialing is likely to prove cheaper than handset-based services. On a network, new features can be added without subscribers having to change their handsets. The internet could also be a gold mine for speech recognition technologies and services. According to US consultancy Probe Research, speech recognition, voice mail, and interactive voice response (IVR) will be the second phase in the development of internet telephony. The use of the internet to transmit voice is expected to grow to 3.1 billion minutes of calls in 1998 from 640 million minutes of calls in 1997. With Microsoft planning to add speech recognition to the next version of its Windows operating system, the day is coming when users will be able to give verbal instructions to their PCs instead of having to type in commands. The same technology will also be used in handheld computers, digital TV set-top boxes and in cars. In January, Microsoft announced that it was working with 15 car manufacturers and electronics companies to develop voice-activated car computers offering navigation, paging, traffic alerts, e-mail, AM/FM radio and an audio CD player. Microsoft claims that within five years time the majority of cars will be equipped with these so-called AutoPCs. Intel has a similar initiative, based on its Pentium chip, which it calls the Connected Car PC. This will offer voice-activated GPS (global positioning by satellite), a cellular phone, internet access and a CD-ROM/DVD-ROM player. Microsoft is making major investments in voice recognition research because its chairman Bill Gates considers speech the natural interface for personal computing. Last October, Microsoft paid $45m for an 8% stake in Lernout & Houspie Speech Products which is Europe’s leading developer of speech recognition products for the PC. Speech recognition is set to become a battleground in the next round of business software suites. Corel has formed a strategic alliance with US speech recognition software developer Dragon Systems to integrate Dragon’s Naturally Speaking dictation package into the WordPerfect Suite. Rival business software publisher Lotus already offers a speech recognition update for WordPro 97 which is based on IBM’s ViaVoice Gold system. Lotus plans to incorporate ViaVoice into all the applications inside the next release of SmartSuite, while Microsoft is expected to add a voice interface to the next version of its Office suite.