The PowerPC triumvirate IBM Corp, Apple Computer Inc and Motorola Inc have always made much of the PowerPC 604 chip’s floating point performance, and its consequent suitability for multimedia applications, but a dearth of software has made it difficult to prove. This changed somewhat at PC Expo in New York last month when IBM demonstrated its sensory suite software for Windows NT which includes software-based MPEG decompression, speech-to-text and speech controlled ‘agents’. The Sensory Suite, like too many of IBM’s PowerPC-related software projects, is running late. Soft-MPEG and MIDI software is shipping bundled with NT boxes, but customers will have to wait until around October for the human-centred components to be sent to them.

Snoop

The finished version of OS/2 for the PowerPC is now delayed until the fourth quarter this year, but the Power Personal Systems division is committed to ship Sensory Suite for that operating system 90 days after it arrives. The human-centred and allied multimedia technologies are vital to the success of IBM’s machines: it needs them to show that sheer processor performance can be converted into a tangible selling point. Software-based MPEG is an obvious example: by cutting out the necessity for an add-on board, the company hopes to show that the total price of building a multimedia system is lower with PowerPC. The company has pulled the same trick with its dictation software. Its Voice Type dictation software has been around for some time on iAPX-86-based boxes, and has generally won plaudits for its accuracy. But until now the software has always required a separate, $500 add-in board to cope with the digital signal processing required. Now this has been moved onto the main processor. The downside of the approach is that trying to run too many of these applications simultaneously will push the processor too far. Toby Maners, programme director for human-centred computing at IBM admitted that trying to play an MPEG movie in one window, while carrying out dictation in another is going to degrade the performance of one of them. Still, no doubt IBM will continue to crank up the clock speed on the processor. But more than any cost-saving, the company is hoping to use its human-centred technology to change the way that people interact with their computers. At the simplest level this is straightforward voice control of existing applications; one of the components to be bundled with the ‘Human Centre’ is Navigator. Put Navigator into learn mode and it will snoop the pull-down menus on the running application. Thereafter, telling your machine to File Open etc, will have the required effect. So far, so dull – there’s not much of a productivity gain there, unless you are physically disabled, or want to open files while standing at the coffee machine. More interesting are the specific speech-enabled applets, or ‘conversational agents’ that the company is providing. One example is the electronic mail agent, which is implemented in such a way that it can be used while sitting in the background. This means that the stressed executive can shout Do I have any new electronic mail? at the machine without pausing from bashing out the latest, most important memo on the word processor. If the disembodied voice answers yes then he or she can further command read it to me. It’s using speech as a separate, concurrent input mechanism to widen the applicability of multi-tasking. The other sample application which is included is a telephone-book package that accepts voice input to make phone calls to the nearest and dearest. The package can be configured to take family relationships alongside regular names and addresses, making spoken commands such as phone my husband a reality. Ms Maners said that unlike the simple navigator, the voice-activated applications can make use of a complex language parser built into the system software.

By Chris Rose

So whether you ask have I got any new mail? or is there any new mail for me? the answer should always been the sam

e. Each speech-enabled application will come with its own particular grammar in-built. The downside is that, as with the full dictation package, the system has to be taught to react to particular speech patterns. This means that our aforementioned harassed executive is going to have to take a couple of hours out to talk to the box and put it through a training programme. Once that is done, IBM claims about a 97% correct recognition rate. Ms Maners admitted that some of this stuff requires a different mind-set from users, but she is convinced that the technology is more than a gimmick. Certainly, ‘Charlie’, the speaking head that can be used to front voice-controlled operations looks like it might be a gimmick; the thing, implemented using IBM’s Open/GL three dimensional image-construction tool kit responds to the user’s requests: its lips move, its eyes blink and then rest. But Ms Maners said that while advanced, techno-centric users, software developers and the like, do away with Charlie, some business users reported that it felt strange talking to a computer, and that a bit of anthropomorphism cleared up the problem. Moreover, these users can use Charlie’s lips to gain visual cues as to what the speech synthesiser is actually saying. But hang on – wouldn’t be simpler just to print the text on the screen so that the user could read it, rather than mucking about with lip-reading? Well yes, we do that too said Ms Maners, the idea is to give the user a choice. These applets are mainly there to whet the appetite of both users and application developers, so when should we expect to see third-party developers take advantage of these capabiliti es? The biggest hurdle to is that we haven’t put out a tool kit yet, Ms Maners admitted. One isn’t being promised until sometime next year. The company intends to hide the complexity of the speech recognition and parsing from developers, who instead will be presented with a high-level interface that will enable them, for example to put a smile on the face, and say the following words… said Ms Maners. Whether users will have a smile on their faces once they see the mammoth memory requirements for the system, remains to be seen. The Power Series boxes come with 16Mb memory as standard, but the full human-centred suite will require another 16Mb to run. That’s not outrageous, Ms Maners claims, considering that the dictation vocabulary takes 8Mb. Still, the price of memory chips remains high, and it seems that very soon, the uptake of advanced multimedia applications will be limited more by the cost of memory chips than the availability of cheap MIPSage on the desktop.

Enough power

Another question and one that seemed to take Ms Maners by surprise, is whether she will make these tools available for Windows NT running on other systems. If the human-centred technology proves as popular as she hopes on IBM’s PowerPC machines, third party developers won’t want to limit their software development investment to just one processor architecture. In particular, they will want to get their applications running on iAPX-86 machines, which still account for more than 90% of all Windows NT ships. Intel Corp is also likely to be the biggest system for OS/2 for some time too. Ms Maners’ initial response was to worry about whether the alternative systems will have enough power. The claim, coming from the Power Personal Systems Division is that at the s ame clock-speed, a PowerPC-based machine will generate double the real-world processing power of a Pentium, or, to put another way, that a Power Series machine of processing power X will cost half as much as a Pentium machine giving the same performance. Ms Maners said she hadn’t given the matter much thought, but after a little reflection decided there was no reason, in principle, why she wouldn’t be willing to have her human-centred work working on NT on other processors, but we would always do the new stuff first on the PowerPC.