Seemingly all of a sudden, language translation tools, both web- and not web-based have become flavor of the month. Long before its recent acquisition by Compaq Computer Corp, Digital Equipment Corp had been looking for ways to differentiated its once- neglected but still hugely popular AltaVista search engine-based site and now it has just launched its World Index service, enabling translations from western to non-western languages. The company licensed technology from language translation maven Systran Inc, which helped with the back-end re-engineering in Unicode. And despite DEC’s recent noise level about the subject, most translation companies have been in business since before many of the people who have started internet companies were even born. As far as computers are concerned, western languages conform to the ISO-Latin-1 standard encoding, but the standard’s eight-bit encoding does not include the characters necessary to represent many non-western languages because it only provides 256 possible characters. There are other standards, for instance ISO-8859-2, which includes all the characters need to write Slavic and central European languages, including German, Czech and Hungarian. Unicode though, claims a universal character representation technology that incorporates characters from both western and non-western languages, including Chinese, Japanese, Korean, Greek, Hebrew, Arabic, Turkish and Cyrillic-based languages. This is, claims DEC, the first time there has been a one-stop shop for all language translation on the web – individual indices exist for numerous languages, but not for all of them in the same place. The system works by first identifying which language and encoding is being used (there is often more than one type of encoding for the same language), then uses Unicode to store the index. It converts the source text to Unicode and stores that. The query is processed and all the results converted into Unicode. The results are then converted back into the original encoding selected for the query and displayed. However, DEC warns that users need to understand how to use encoding and how to set it in their browsers and operating systems, because whatever encoding is set, the keys pressed are interpreted strictly in accordance with the computer’s encoding setting (see http://altavista.digital.com/av/oneweb for a font and encoding test for Chinese, Japanese and Korean). DEC had been telling us a few months back that it was planning to sell a version of the machine translation (i.e. not using humans) software as a standalone product, but it appears that its partner, the veteran Systran has beaten it to it. Yesterday it launched Systran Personal, a tool for converting to and from English, French, Italian Spanish and Portuguese. The San Diego, California company was surprisingly reticent for a company making its money from multi-lingual communications and did not want to talk about any possible OEM deals with DEC, or anything else really. The tool costs $30 for unidirectional translations and $50 for bi-directional. It is available from the Systran and AltaVista web sites. Moving on, another veteran company, Alis Technologies Inc, which is based in Montreal, thinks the machine translation companies do not go nearly far enough. Alis has been in the translation business since 1981, and produced Arabic and Hebrew versions of DOS even before Microsoft owned the operating system. It also produced the first multi-lingual browser, Tango, based on Spyglass Inc’s Mosaic. Tango is still available, while the core of Mosaic is, of course owned by Microsoft. Alis has, until now, mainly produced tools for companies to translate their own proprietary content, but is now moving to the internet with a mixture of services and technologies. Its first major project is with the Los Angeles Times, with which Alis is working on translating certain sections, such as restaurant reviews. The pilot began at the end of last year. The company is planning multi-lingual versions but notes that things like the names of streets and restaurants present a special problem because they often don’t make sense if they are ‘translated’ along with the rest of the text. Steve Allan, Alis’ senior product director also says news is more difficult to translate and you could insult people trying. Once again, the company is looking at packaging the internet technology to license it to content providers to translate their own web sites. Alis also has an agreement with Verity Inc that has produced a multi-lingual search and retrieval engine. Allan reckons many machine translation (MT) technologies fall down because they ignore the various formats text can come in before translation. For instance, with the LA Times project, Alis had to deal with the ‘drop cap’ at the start of each story, which was usually a .gif file and not text; it had to be converted into text before translation otherwise the first word and sentence would be nonsensical. It also has tools to convert text back to the original format once translated; put back accents that have been removed when translating French and German; a markup language to add tags delimiting areas that should not be translated and multi-lingual dictionaries. Allan believes the web will be the first medium for this technology, followed by email. And it is these pre- and post- translation editing tools plus its human translation services – available if MT proves inadequate – that it reckons gives Alis the edge at the moment.