SGI has made a few forays into the Intel workstation and server market, and has not had an easy time of it trying to compete against Dell, HP and IBM in the Intel-based server and workstation markets. But by sticking to its HPC knitting and delivering what is arguably the most powerful and elegant Linux HPC machine available on the market, SGI seems poised to carve a real niche for itself in the technical computing market with the Altix 3000 machines.

Under the NUMAFlex cache-coherent, non-uniform memory architecture, each processor runs its own instance of an operating system, but like processors in a symmetric multiprocessing (SMP) cluster (a clustering method that is common in generic servers and workstations today), all of the processors in a NUMAFlex cluster can access the same shared pool of memory. Applications run in memory, and any time processors have a single memory space to play with, it simplifies programming and allows bigger programs and more programs to run more efficiently on a given machine than is possible in a loosely coupled machine like a Beowulf Linux cluster.

In a sense, the Altix and Origin supercomputers are Linux and Unix clusters that use a NUMA crossbar switch that is similar to the SMP crossbar switches to replace the external switches used in clusters. The nearly ubiquitous Myrinet switches used in Linux clusters, for instance, have a 10 microsecond latency to memory, but the shared memory space of a NUMAflex machine can make the same memory access in 50 nanoseconds – a factor of 200 improvement. This is why SGI will, according to Jan Silverman, senior vice president of marketing at SGI, target the existing and large installed base of Intel-based and Linux-based clusters out there in academia and the research arms of governments and corporations with the Altix 3000s. These organizations will be able to pick up their applications from Linux clusters and drop them onto the Altix 3000s unchanged and see a huge boost in performance and a decrease in the amount of iron they need to dedicate to number-crunching jobs.

There are two different Altix 3000 models. The Altix 3300 is a deskside supercomputer that offers from four to twelve Intel Corp’s Itanium 2 processors running at 900MHz and equipped with 1.5MB L3 cache memories. A base machine has a list price in the U.S. of $70,176. The Altix 3700 scales from four to 64 processors in a single node, and the machine can be equipped with 900MHz/1.5MB L3 cache or 1GHz/3MB L3 cache Itanium 2 processors. The future Madison 1.5GHz and Montecito Itanium 2 processors, are due in mid-2003 and sometime in 2004, respectively. The Altix 3700’s shared memory scales up to 512GB per node. A 16-processor Altix 3700 costs $331,642, while a 64-way node costs $1.13m. SGI will also create supercluster versions of the Altix 3700, which lashes together eight Altix 3700s into a giant cluster and gives applications 1TB of global shared memory to play in. The NUMAflex 3 architecture, says SGI, can scale up to 2,048 processors and up to 16TB of global shared memory.

While the Altix 3000 machines will run Red Hat Linux 7.2 and scale up to 64-way processing without any changes to the Linux kernel, SGI has developed extensions to Linux that it has bundled into a collection of programs called the ProPack for Linux that can significantly boost the performance of the Altix 3000s and their Linux operating systems. Just how much, SGI is not saying, however. But it did offer some clues. The differences in performance using the ProPack on real-world applications could be profound, says Silverman. The more interprocessor and memory communication you have, the bigger difference ProPack makes.

The ProPack includes the XFS, a high-performance file system that SGI helped create with the Linux community because, as Silverman puts it, the I/O and compilers in Linux were so bad that if SGI didn’t help out, then they would be limiting factors in the adoption of Linux in HPC markets. The ProPack also includes XSCSI, a SCSI interface that has more bandwidth than the SCSI interfaces used in the Red Hat Linux distribution, as well as a set of hard-coded, assembly language SCSL math libraries that offer much higher performance than those in Linux. ProPack also includes MPT, a highly tuned version of the Message Passing Interface (MPI) used in supercomputing clusters to allow nodes in a cluster to talk to each other. The extensions also include FFIO, a memory buffering library for Linux that also boosts performance. Silverman says that SGI does not at this time plan to offer these improvements back to the Linux community, seeing as though these are core differentiators compared to the Linux-Itanium and Unix-RISC solutions that its competitors have or will soon bring to market. He says that SGI is constantly debating what to put out as open source, and when to do it.

SGI is going to attempt to target the Origin 3000s at its installed base of Irix customers and those who need more scalability than the Altix 3000s offer or those who want a big visualization platform rather than a number-cruncher. Having picked the right machine for a target workload, SGI will then pitch either an Origin 3000 or an Altix 3000 against a competitive RISC/Unix or Lintel cluster.

The current Irix-based Origin 3000 servers, which are based on the third generation of NUMAFlex clustering technologies developed by SGI, scale to 1,024 processors and 2TB of shared main memory in a single system image. In terms of memory space, this is the biggest machine that SGI makes and this will continue to be the case. The biggest Altix 3000 machine scales only to 64 processors, but it has support for 512GB of main memory. Those numbers are important for reasons that will be obvious in a minute. The Origin 3900, the flagship of the Origin 3000 series, uses the 700MHz MIPS R14000A processor designed by SGI as its core processor, which delivers 1.4 gigaflops of processing capacity per processor. A 1GHz McKinley Itanium 2 processor from Intel packs a lot more number-crunching oomph at around 4 gigaflops – nearly three times, when you do the math. That makes a 64-way Altix 3000 machine capable of about 256 gigaflops of processing in a single 64-way node that costs $1.13m at U.S. list price, yielding a cost of about $4,400 per gigaflops. A 128-way Origin 3000 machine should be able to hit about 180 gigaflops, but with 64GB of main memory in the box it carries a price tag of $2.94m, or about $16,300 per gigaflops. The Origin boxes can scale up to about 1.4 teraflops and they run a more scalable and feature-rich Irix Unix environment rather than off-the-shelf Red Hat Linux 7.2 with ProPack extensions, and that is why they carry a nearly four-to-one price premium on a flops-to-flops basis.