Advanced Micro Devices Inc unveiled the most impressive threat yet to Intel Corp’s Pentium last week, in the shape of its K5 processor. The company claims that the new chip will be around 30% faster than Intel’s chip at the same clock speed but confirmation will be some time coming: originally it had hoped to say that the chip had reached tape-out stage in September but at the Microprocessor Forum, Advanced Micro director Mike Johnson amended this to the next couple of days. At its heart, the K5 isn’t an Intel-compatible chip at all – it’s a RISC processor somewhat akin to Advanced Micro’s superscalar AM29000 chips.
Variable length
However the RISC core and its instruction set (called ROPs – pronounced ar-ops)is are hidden from the end user. Instead, iAPX-86 instructions are converted into these RISC operations which are then handled by six parallel execution units: one floating point unit, two integer units, two load-store units and a branch unit. It is an approach similar to NexGen Microsystems Inc’s Nx586, but Advanced Micro has included some extra technology at the beginning of the translation process, which the company claims will enable up to four iAPX-86 instructions to be despatched concurrently. This is pushing it a bit, since only the very simplest iAPX-86 instructions will map directly onto an ROP. Simplest in this case, equates to instructions such as register-to-register adds. Most operations take two or three. The iAPX-86 instruction set, like the whole complex tribe, presents a couple of problems for the designer bent on producing an iAPX-86-RISC hybrid. The first big one is that iAPX-86 instructions are of a variable length meaning that the processor has to search the instruction byte stream as it comes in from memory or cache, looking for the start of each instruction. Advanced Micro uses an innovative approach to overcome this – the iAPX-86 instructions are partially decoded as they are pulled into cache. It might be thought that this process would slug the speed at which the processor pulls instructions from memory – but the firm points out that memory accesses are comparatively sluggish anyway, and says that it can hide the time needed to pre-decode within this bigger lag. As instructions are pulled from the cache the processor translates them into the appropriate ROPs which are placed four per clock into a byte queue ready for dispatch. The queue will always attempt to dispatch four ROPs irrespective of the instruction boundaries of the original iAPX-86 instruction. This leads to the curious situation where the processor can actually be executing, say 1-and-a-bit Intel instructions. The RISC core offers full out-of-order issue and completion, and so it appears that Advanced Micro has more or less overcome many of the nasty issues previously been thought to hamper parallel execution of variable-length Intel instructions. It looks so simple that the average cynical journalist might start looking for the smoke and mirror. Still, no-one stood on their chairs and cried Foul at, or after the Forum presentation – and they are the experts. What does remain to be seen is how the processor behaves with real-life code; if it turns out users’ applications are rich in iAPX-86 instructions that map onto 3 ROPs – well, Advanced Micro will kiss its claimed 30% advantage over Pentium good-bye. Not surprisingly it has been doing its own simulation, and reports that typical 16-bit iAPX-86 applications have an instruction mix that works out at 1.9 ROPs per instruction. It believes 32-bit code brings this down to 1.3 ROPs per instruction. In other words, as 32-bit applications and operating systems become more popular, the K5 should begin to shine in Intel benchmarks. Intel is promising systems based on the P6 by the end of next year – and that is said to have a RISC core. What’s the betting that internally, Intel’s own Pentium killer turns out looking a bit like Advanced Micro’s and Nexgen’s? – Chris Rose (C)PowerPC News