By William Fellows
IBM Corp has a two-pronged plan to introduce distributed shared memory into its RS/6000 Unix server lines, ComputerWire has learned. An extended version of a simple cache only memory architecture (S-COMA) which it has code-named LA ccNUMA (for Local Access), is being developed to enable main memory to be shared between a cluster of SP parallel systems, while maintaining a some traditional programming APIs. First to market though, will be a classic ccNUMA architecture that will enable, say, four SMP RS/6000s to share a single copy of AIX and main memory, albeit with some the latency ccNUMA implies.
The need for new distributed shared memory models is being driven by the inability of computer main memory development to keep pace with huge increases in processor performance. Current memory chips cannot keep gigahertz-class processors supplied with a constant stream of instructions, which means that valuable CPUs cycles are wasted as the chip idles. Vendors have developed very large symmetric multiprocessing systems (SMP) with memory caches situated close to the CPU in an attempt to minimize the problem, but today’s SMP systems, now with as many as 64 CPUs, are bumping into the physical limits of SMP design. At this point adding CPUs does produce a corresponding linear improvement in system performance.
Memory is the real bottleneck, says IBM Fellow and director of server design Jim Rymarczyk, who says that distributed shared memory is the natural extension of symmetric multiprocessing architectures in which multiple CPUs share a single pool of memory, a unified memory architecture (UMA). It’s not a bug or a speed bump in SMP, he said, ccNUMA [cache coherent Non Uniform Memory Architecture] will be come prevalent and software will have to guarantee affinity and proximity to cache.
In some ways it is surprising that IBM has hung its hat on distributed shared memory so quickly, as evidenced by its $800m acquisition of Intel-based ccNUMA vendor Sequent Computer Systems Inc. It took years for IBM to grasp SMP by the horns. Its RISC CPU designs have, since the days of the Rios 1, emphasized single engine speed. Rios did not even support SMP. With the 24-way S80 Condor announced last week, IBM has finally got SMP religion says Rymarczyk.
The ccNUMA technology for its RS/6000s will use the same source code as Sequent, but run on PowerPC instead of Intel. It will also use a switched fabric interconnect to be introduced in the next generation of SP parallel servers rather than Sequent scalable coherent interconnect (SCI) mechanism. By the same token, Sequent’s next-generation systems will inherit some of IBM’s work, while the AIX-based Monterey64 Unix that IBM, Sequent and Santa Cruz Operation Inc are working on will include elements of both.
Further out IBM will offer LA ccNUMA as a way of providing conventional shared memory interface, which is more familiar to developers, on a distributed memory or message passing system, with no special hardware support.
LA ccNUMA extends classic distributed virtual shared memory, or S-COMA, which uses a portion of cheap main memory to maintain a directory of where data is stored. Data is migrated or replicated at each of these caches. IBM says it doesn’t require a new node memory controller to manage the cache, instead a small piece is added to Virtual Memory Manager. From the application perspective, S-COMA enables programs to share data in the same basic way as SMP systems. The application does not need to concern itself with where shared data resides. The advantage is it provides hardware support for coherence, enabling shared memory access with the same performance as ccNUMA systems, but also enables the use of local memory as a large cache and does not require a global virtual memory manager layer.
But in S-COMA the latency penalty for accessing remote data is will be high, conventional middleware will not work, and programming models are changed. This approach is fine for some very high-end users such as Lawrence Livermore Labs which is used to making changes in application code to take advantage of systems with more performance than those they already have. But for commercial users the cost is too high.
So IBM ‘morphed’ S-COMA into LA ccNUMA by doing two things. First, it added another huge, third layer of cache memory so that main memory does not need to be used as a cache. This is possible, Rymarczyk says, because by 2001 IBM will be able to sell caches as big as 128Mb, with gigabyte sizes available shortly after. Second it wrapped the hardware with interfaces that means commercial middleware will be able to run over it and developers will be able to use programming environments such as message passing interface (MPI) or the Intel-back Virtual Interface Architecture. Instead of being based upon I/O, Rymarczyk explains, LA ccNUMA uses a software library of code only doing load and store operations. The large third level caches means that pages won’t have to be moved around as in conventional ccNUMA systems.
IBM’s real goal is to bring to Unix the kind of dynamic partitioning available on mainframes where an application doesn’t need to be restarted to move it into another domain on the system. It’s also building out AS/400’s Lpar partitioning to have the same capability. Rymarczyk says IBM also want to be able to offer the same reliability, high availability and serviceability as Parallel Sysplex clustering offers mainframe users. He expects Windows NT to co-exist with these environments initially, but will eventually incorporate its own distributed shared memory functionality.