Geoff Conrad reports on some of the hotter properties at the International Solid State Circuits Conference last month.
In February, the world’s top chip designers, silicon architects and devices devisers gathered in New York to strut and boast and gaze into their crystal balls to frighten the opposition with their plans for the future. The pilgrims to the annual International Solid Circuits Conference were given previews of a whole range of spectacular chips as the semiconductor industry showed just how far it had managed to stretch the limits of technology in the past year.
Staid old DEC
To show the speed of developments: last year the star of the show was a 1M-bit dynamic random access memory, this year five 4M-bit devices were on show, but the star came from Japan’s NTT Electric Communications Laboratories – a 16M-bit dynamic holding 2M-bytes on a single chip. Of the processors, the design that was really over the top came from, of all people, Digital Equipment Corp, the staid DEC. This was for an array of 262,144 processors – 8,192 chips each with 32 processors – and 384 companion router chips each with 64 data inputs and 64 data outputs which allows each element to communicate with any other in the array. The massively parallel architecture of the full-scale array is claimed to pump out 10 Gflops, 10,000m floating point operations per second, or 2,600,000,000,000 4-bit operations per second. The processing element chip has 242,000 transistors and each of the 32 individual processing elements has 1K of static random access memory, two shift registers of programmable size, a 4-bit adder, an arithmetic-logic unit, two 1-bit registers, and neighbour and router communication paths. A 4-bit operation in each processor takes 100nS to execute, allowing the whole chip to handle 320m 4-bit operations per second. Or, by expanding in nibble-serial fashion it can handle 40m 32-bit operations per second. Each processor has the logic to connect to the memory of three adjacent memory chips to give it access to 4K of memory. Hewlett-Packard, which already has a reduced instruction set computer at the heart of its Spectrum Precision Architecture commercial and scientific machines, showed two other RISC chips at the ISSCC. One was a 30MHz 15 MIPS, 32-bit chip designed to implement a set of 140 instructions using direct hardwired decoding and execution. No fewer than seven internal 32-bit buses are used to link the various on-chip go-faster feature: 25 control registers; a shift/merge unit; a five-stage three word deep instruction pipeline; and a 32-bit arithmetic logic unit. The chip includes logic for decoding and prioritising traps and interrupts and has a special interface to support data transfers among the cache, the CPU and the co-processors, which handles the copy-in and copy-back traffic between the cache and the main memory. Hewlett’s other 32-bit part was a stripped-down basic RISC chip with 164,000 transistors and a peak performance of 8 MIPS. Virtually all instructions requiring more than one clock cycle to execute – apart from load-store co-processor and branch instructions have been eliminated, and the total number of instructions has been reduced to an absolute minimum. The Hewlett-Packard microprocessor uses a common multiplexed data and address bus and a five-stage pipeline. One of the universities that continued work in RISC technology after its start at IBM – Stanford University’s Centre For Integrated Computing – unveiled its third generation of 32-bit RISC chips: the MIPS-X. The Stanford researchers have simplified even further the instruction set for the chip, using a simple instruction format that can be decoded very quickly, allowing an instruction to be decoded every cycle. MIPS-X uses the conventional RISC load-store architecture similar to the earlier MIPS Computer chips and most other RISC machines, but the number of instructions have been pared down to a very basic 37, each 32 bits long. But, it is claimed, the key to its speed and high throughput is the large 2K-bytes of on-chip instruction cach
e and the ability to fetch two words per cycle, reducing the off-chip instruction bandwidth. The 150,000-transistor double-metal n-well CMOS CPU has a peak operating frequency of 20MHz.AT&T Bell Laboratories presented two papers on its high-speed, 32-bit Crisp – CMOS reduced instruction set processor – that can execute instructions at up to 16 MIPS with a 16MHz clock. The 172,000 transistor Crisp is a memory-to-memory registerless machine with only 25 instructions and four addressing modes. Crisp gets its speed by being organised into two logicaly separate machines: a prefetch decode unit and an execution unit, each with a three-stage pipeline. It contains seven static random access memory arrays, totalling 13K-bytes. Unlike other RISC machines it has no hardwired address or data stacks. Instead, 32 internal stack-cache registers are allocated and mapped onto the on-chip statics, allowing the physical cache to be changed without affecting the control software.
32-bit VAX-compatible
ISSCC was a hardware show, but the development of a RISC machine involves at least as much development work with an associated smart compiler to take advantage of the architectures, on-chip registers and pipelines and other quirks of the design. But apart from AT&T’s mention of its development of the technique of branch folding (where branches are executed along with other non-branching instructions), the software support for the RISC chips was ignored. Meanwhile, even more work is going into further developing the conventional CISC Complex Instruction Set Computer machines to take advantage of the new developments and higher densities now available. DEC, for example, described a VAX-compatible 32-bit single-chip microprocessor with a host of advanced architectural features: an on-chip 1 Kbyte instruction and data cache with tag and data parity, pipelined micro instruction execution, overlapped instruction prefetching, paralled instruction decoding, and on-chip memory management. The 180,000 transistor part uses a set of 304 instruction and runs at 25MHz.