Sequent Computer Systems’ Symmetry machine, which promises a peak performance of 80 MIPS, has struck problems in the production process of implementing the cache in VLSI. The cache design is the part of the machine that gives it its improved performance and the Portland, Oregon company insists that there is no design problem and that the production problem will be solved over the next few months. The Symmetry machines with the new caching method are being used internally but the reliability of the machines cannot be guaranteed because of the production problems. The first Symmetries will ship in November with the old caching mechanism, supporting only the 10-processor configuration: the new one supports up to 30 CPUs. An upgrade path for those who take the intermediary path is being worked out. Linear rise The Symmetry ties together up to 30 Intel 80386 32-bit microprocessors and gets a linear rise in power as processors are added by providing each with its own cache memory, a clever memory management scheme, and bus hardware assists for the Unix operating system. Sequent entered the parallel processing market in 1984 with the Balance, based on National Semiconductor 32-bit microprocessors and an efficient, custom-des-igned, low-cost system bus. Having sold 164 systems, it is established in the technical and engineering markets and hopes its new range will take it into the commercial market. The Symmetry systems consist of central processing unit boards, global memory, dual channel disk controllers and Multibus adaptor boards, all tied together with the system bus.Each processor is built up from a 16MHz 80386, an 80387 floating point unit, an optional floating point accelerator, a two-way set-associative cache, and bus interface logic. Two complete processors are packed onto a 12.5 by 14 card: an eight-layer printer circuit board that holds the processors, a 64Kb cache and three VLSI chips to manage the cache and oversee the bus. CMOS gate arrays are used for the cache-memory controller and the bus interface controller, while the bus data-path controller is fabricated from CMOS standard cells. As the system expands from two to 30 processors, the memory expands to keep pace: each memory controller board can handle up to 40Mb of memory using 1M-bit chips. A full system uses six controllers to give 240Mb of global memory. And the cards have been designed to cope with the forthcoming 4M-bit memory chips: when these are available the system will support almost a gigabyte of global memory, together with all the logic for error detection and correction, automatic initialising and interleaving between memory controllers. Each processor delivers 3 MIPS and operates with zero wait states when it has a cache hit – when the data it requires is in the cache memory. And this is the problem. Microprocessors are now so powerful that designers must either use expensive 50nS to 60nS static random access memory chips, or sacrifice performance by using commercially priced 100nS to 120nS dynamics, which introduce wait states, in other words the microprocessor has to idle for a few cycles every time it requests data from memory. Or they can use a cache and devise a complex cache management scheme to ensure that when one processor alters the data in its cache, another processor does not call in the unaltered data from the global memory. The Symmetry designers chose the last option, partly because they could integrate the cache-management logic with that necessary to prevent contention for memory between the multiple microprocessors, and partly because they had decided to use an upgraded version of the bus developed for the Balance machines. The data highway they developed is a 10MHz synchronous bus with a 64-bit data path and a 32-bit address path multiplexed with the data on the lower 32 lines of the bus. It has an overall sustained bandwidth of 53.2Mbps, which means that bus traffic had to be kept to a minimum. The cache management scheme has to guarantee that the data in the cache (a copy of portions of the global memory) remains current,

or coherent; ensure that any changes made to data in the caches are reflected in the main memory when another processor or input-output device wants to access the data; and react correctly when more that one CPU tries to write to the same memory location. And it must do all this while holding down bus traffic to avoid congestion, contention and idle processors waiting to use the bus. Most multiprocessors use the write-through technique: this guarantees cache coherence by following every write to a cache with a write to the corresponding main memory location. Every other cache watches the bus with a snoop to read the addresses of all the writes: if the caches contains a copy of the data held in those locations, the snoop signals the processor to invalidate it. Although this guarantees coherence, it generates a lot of unnecessary bus traffic as the memory is updated every time the cache is written to, even if no other processor is using the data. Sequent uses a copy-back scheme, which it claims cuts the number of writes to the global memory by 50%. The blocks of data in the cache are all tagged either private, modified, shared, or invalid. When there is a cache miss – a processor requesting data that is not in the cache – it is read from the main memory and, if no other copies exist, tagged as private. The processor can then write over the data as many times as it likes without updating the global memory and generating bus traffic; it simply changes the tag to modified.The data in the main memory is now stale, and if another processor requests that data, the snoop monitoring the bus for the cache that owns the data intercepts the request. Invalid It then copies the modified data to the requesting processor and updates the main memory with a single bus transaction, and the data in both caches is tagged as shared. This is all done by the cache memory management and bus interface hardware without stealing time from or slowing up the processors. The first time one of the processors writes over the shared data in its cache, it changes the tag to modified and tells the others caches sharing the data to change their tags to invalid with a single bus transaction. It can then repeatedly write over the data without using the bus. The designers also exploit the bus characteristics to cut the bus traffic still further: for example, the bus protocol is much more efficient at reading than at writing, which occupies the bus for far more cycles. So, if a processor tries to write over data in the global memory that is not held in the cache, the memory management system turns it into a read from main memory and then overwrites the data that is read into the cache, tagging it modified. Also, the bus itself can handle and queue multiple requests to the memory and input-output devices – three reads, two writes and an input-output request – and will not accept requests from processors when the queue is full, thus cutting down on a lot of request – request denied bus traffic. Sequent claims a price per MIPS of between $8,000 and $14,000 for its own machines; $90,000 to $100,000 for the DEC VAX; and $130,000 to $170,000 for an IBM 3090.