SGI’s biggest ally in pushing the single system image size of the Altix 3000 Linux and Origin 3000 Irix machines has been NASA’s Ames Research Center, in Moffett Field, California. The NASA facility has a 1,024-processor Origin 3800 server, and has pushed the Altix 3000 from its initial 64-processor configuration with a single shared memory, to a 128-processor setup in November 2003, to a 256-processor configuration this week.
And it will be NASA Ames that helps SGI push the Altix line to a 512-processor system by the end of the year. Because the current Madison Itanium 2 processors are considerably more powerful than the MIPS RISC processors used in the Origin line, the Altix machines offer much better price/performance. The Origins have the virtue of running Irix and its applications, which have been a part of the supercomputing world for decades.
SGI says that customers who need more oomph today with Altix machines can deploy superclusters with up to 1,024 processors: that’s four 256-ways with four distinct memory spaces clustered together like any other clustered Unix or Linux parallel server. Such a supercluster will be available for the Altix line this May.
The ability to put so many processors behind a single memory space is not new to SGI, but it does take some tweaking to make the NUMAflex architecture behind the Origin servers work with both the Itanium chips and the Linux operating system. NASA Ames had two 256-processor Origin 3800s, with about 410 gigaflops and 512GB of shared memory in each half of the supercluster in early 2000, and later that year it had upgraded to two 512-processor nodes, with 1 TB of shared memory in each half. In July 2002, that machine was upgraded to the 600MHz R14000A processors designed by SGI and made by NEC Corp.
That machine, with 1,024 processors, is rated at about 1.2 peak teraflops with about 853 teraflops on the Linpack Fortran benchmark test. It is telling that an Altix 3000 with 256 Madison Itanium processors running at 1.3GHz, which sits right beside it at NASA Ames, is rated at 1.3 teraflops peak and about 1.1 teraflops on the Linpack test.
That single system image for main memory makes both the Altix and the Origin machines very efficient, compared with other parallel architectures that more loosely couple processors together and do not have a shared single memory. It is not uncommon for parallel architectures to have 40% of the aggregate flops in a cluster go up the chimney because of latencies due to slow interconnect and partitioned memories.
This article is based on material originally published by ComputerWire