If separate voice recognition R&D projects at Acorn and Lernout & Hauspie prove successful, technophobes should soon be able to program their VCRs and digital TV set-top boxes by voice command. The technology could also prove a breakthrough in helping novice users to navigate the web from their TV sets. According to Jupiter Research, by the year 2000, 16% of Internet users will be logging on from a source other than a PC, and the TV is likely to be the dominant non-PC internet access device. By 2002, internet TV hardware shipments will have risen to 7.7 million units from 1 million units this year, the US consultancy says.
By Robin Arnfield
While Acorn has opted for a software solution which runs on a StrongARM RISC processor, Lernout & Hauspie (L&H) is going down the hardware path. The Belgian company is working with Microsoft and Israeli start-up Creator to develop a dedicated voice recognition chipset for a new generation of Windows CE-based digital set-top boxes. The chipset will enable users to select TV channels, call up web pages and download multimedia content from the internet by talking into a microphone. According to an article in the Financial Times, the voice-enabled set-top boxes may be ready to ship by 2000. Acorn has teamed up with the DERA Speech Recognition Laboratory, part of the UK government-owned Defense Evaluation Research Agency, to develop a voice recognition software module which can be incorporated into any consumer appliance. Because Acorn is now focusing on the digital TV and internet appliance market, the initial application for the software is expected to be in set-top boxes rather than mobile phones or cars, says Steve France, senior marketing manager, information appliances. The Cambridge, UK-based company no longer manufactures set-top boxes itself but instead licenses its software and hardware reference designs to consumer electronics companies.
Not a black box
In its voice recognition plans, Acorn is consciously avoiding the black box approach adopted by many technology companies which have developed a new application. We think it is better to develop customized versions for specific set-top box vendors, says France. DERA has authorized Acorn to demonstrate the prototype to fire up interest, he explains. We have a verbal agreement which allows us to talk to potential customers such as JVC and capture their requirements. We would then need to get DERA to do the custom engineering. At last month’s Cable & Satellite exhibition in London, Acorn gave a demonstration of how its software can be used to verbally program a VCR or navigate an electronic program guide (EPG). Although the technology is still at prototype stage, requiring a headset and a microphone to eliminate background noise, it is able to recognize spoken commands after they have been repeated several times. The software on display at Cable & Satellite was running on a set-top box containing a StrongARM 100 200MHz chip and took up half of the chip’s processing power. With ARM’s new StrongARM 1500 chip, there will be enough horsepower to decode two MPEG-2 data streams as well as handle voice recognition, France says. Realizing that most users will balk at wearing a headset, Acorn envisages building the software into a TV remote control, with the commands being transmitted over the existing infrared link to the set-top box. Ultimately, our vision is for a system which does not need a remote control but simply lets users talk to the TV, he says. The advantage of a software solution is that it is much more flexible than a dedicated voice recognition chip which only has limited storage capacity for new vocabulary and has to be trained very specifically for different languages. With our software, you just need a powerful microprocessor, and then you can have a large vocabulary which can be tailored very easily. With a chipset, re-editing the vocabulary is a major task. Designing voice recognition software for TV users is considerably more complicated than for the PC market. As so many different people are going to switch on any one TV set, the system has to be able to work with anyone and can’t rely on training sessions, Mr France says. Most PCs are dedicated to one user who can train his machine to recognize his voice. Acorn’s solution is to incorporate continuous natural speech recognition, so that users do not have to artificially break their speech into individual words. Words are programmed into the system through phonetic descriptions, which means that individual applications can be created with different word sets. Because of the continuous speech facility, the software will monitor everything it hears and only respond to keywords which have been programmed into it. To save on computation time, Acorn envisages adding a button to the TV remote control to switch the speech recognition on and off.