By Gary Flood
We’ve all heard those text-to-speech voice synthesizers which make computers all sound like Nobel Prize winning cosmologist Stephen Hawking (who is unable to speak naturally owing to the effects of Lou Gehrig’s disease). And then there are the bits of pre-recorded tape one hears on IVR (interactive voice response) systems (When you are finished recording, press or say 1 for more options). The first is kludgey and sounds very artificial – but does allow flexibility (the computer can speak any arbitrary text sequence). The second makes the system sound like a person, but is obviously just canned, or pre-recorded, text. Now we have the possibility of allowing our electronic systems to speak to us in ways indistinguishable from humans – but we’ll have to think very carefully about whether we want to let them do this. This is the implication of a piece in last month’s New York Times on electronic impersonators, computer systems that can speak in a particular person’s voice. The idea is to take a recording of, say, you, or the actor who did Darth Vader’s voice, James Earl Jones, break it down into individual phonemes, store them, and manipulate them. This is the claim, at least, on the part of a Kyoto, Japan-based research facility, Advanced Telecommunications Research Laboratories (ATR), which has developed a prototype that can even make you speak in a language you don’t know. Nice spin-offs: we get a movie where Marilyn Monroe lives again, since we could generate her image by CGI (computer generated imagery) technology and give her back her real voice; you could have Sir Laurence Olivier read you out your e-mail while you’re in the car; books on tape could be read out by their author in their entirety. Nastier implications: jokes where a famous person says something embarrassing in a faked interview; endless scams concerning Elvis calling in to radio stations; and voice fraud, a potentially rich new area for criminal enterprise (Is there anything else you need after we wire all the money to that account in Miami, Mr Trump?). ATR’s system, Chatr, is a tad off from being that powerful, mind. It needs an hour of you reading the phone book before it can do anything, and apparently it is still easy to tell it is a machine talking when it comes out of the speaker, and it needs to cook the data for a good long time before it is able to order pizza for you. But the naturalness of intonation is reported to be very good, though there are usually some tell-tale pauses. There is also the aspect, that, as in all things digital, the thing only knows what you tell it – so if it really was you reading the phone book as the sample, it might be a stretch for it to do a natural sounding soliloquy from Hamlet on your behalf. And the aims of the lab – jointly supported by the Japanese government and by local Japanese industry – are modest enough, the creation not of electronic impersonators so much as a telephone set-up that could translate your English words into Japanese on the fly, and vice versa. You can play with a demo at the web site, https://www.atr.co.jp/.