Back in March, as IBM was trumpeting the benefits of the Power architecture when married to the open source Linux operating system, the company was showing off a small subset of the Blue Gene/L machine that had 256 of the custom Blue Gene processors, which sported two 500MHz processor cores based on stripped-down 32-bit PowerPC 440 series CPUs. The machine that IBM tested in the Rochester labs – for bragging rights even though LLNL has not received its full Blue Gene/L machine yet – on September 16 using the Linpack Fortran benchmark was considerably larger, delivering just over 36 teraflops of sustained performance – 36.01 teraflops, to be precise, and about 45 teraflops of peak performance – using 700 MHz processor cores.
But this tested Blue Gene/L box, which filled only eight industry-standard server racks, has only an eighth of the peak 360 teraflops of performance of the final Blue Gene/L machine that LLNL will get starting in the first quarter of 2005. LLNL is paying IBM around $100 million to manufacture the Blue Gene/L box, and is forking over another $190 million for the ASCI Purple massively parallel AIX supercomputer, which will be comprised of 196 64-way p5 590 servers using 1.9 GHz Power5 cores – a whopping 12,544 of them, in fact.
Aside from raw computing power, even the partially completed Blue Gene/L super that was tested in IBM’s labs has some serious benefits compared to NEC’s Earth Simulator, which has been the reining champion of the supercomputer performance wars for three years. Earth Simulator is a parallel cluster of 5,120 vector processors (used in NEC’s SX line of supercomputers) that run at 500 MHz. Occupying 34,000 square feet of floor space, consuming 5 megawatts of power, costing $350 million it is a monster by any measure. But it can run a lot of existing code, and it proved the benefits of a hybrid vector-cluster architecture. More importantly, Earth Simulator has spurred the politically and nationalistically inclined server makers (mainly IBM, but also Hewlett-Packard and Sun Microsystems) to push their technology envelopes in the high-performance computing space.
While Blue Gene dates back to late 1999 as a concept machine – IBM initially promised that it would create a 1 petaflops supercomputer using 1 million extremely minimalist processors using $100 million of its own money – the advent of Earth Simulator, which was relatively easy to build, at the high-end of the supercomputing market and cheap yet inefficient Lintel clusters at the low end of the market spurred IBM to get practical with the Blue Gene concept. It launched the Blue Gene/L derivative based on the PowerPC core, put a stripped-down Linux kernel on it, and – most importantly – created a supercomputer that organizations like LLNL would actually pay money for.
We’re hopeful that this helps government policy makers and the supercomputing industry to understand that the US industry is capable of innovation, said Dave Turek, IBM’s vice president of deep computing as he made the test results public. IBM’s main US rival these days, Cray, would certainly echo that sentiment, as would Hewlett Packard and SGI, which have tended to focus less on trying to hit the top spot on the Top500 supercomputer list and more on trying to sell supercomputers to companies and organizations that up until now could not have afforded the luxury. Flops are pretty cheap out there today, but a machine that uses flops efficiently is not.
So why is Blue Gene/L exciting? The machine that just bested Earth Simulator by a nose on the Linpack tests occupies 320 square feet of floor space (about 1/100th the space of Earth Simulator and about the size of a studio apartment in NYC). Blue Gene/L also delivers almost the same efficiency – that’s sustained performance divided by peak theoretical performance – as Earth Simulator, which has a stunning 88% efficiency. This is arguably the best rating any parallel supercomputer architecture has every achieved. The piece of Blue Gene/L that IBM tested in Rochester is running at 80% efficiency, delivering about 45 teraflops of peak performance and about 36 teraflops of sustained performance. This is pretty good, too, considering that it spans 16,384 PowerPC processor cores. The question now is can IBM deliver that high efficiency on a machine with over 131,072 cores?
The piece of Blue Gene/L tested in the Rochester factory consumed 216 kilowatts, compared to 6 megawatts for Earth Simulator. Eric Kronstadt, who is the lead researcher at IBM’s T.J. Watson Research Center in Yorktown Heights, New York, spearheading the Blue Gene project, says that at 11 cents per kilowatt, it costs $5 million a year just to juice up Earth Simulator, compared to about $180,000 for a Blue Gene/L system of the same raw number-crunching throughput. Every watt that goes in as juice to do computation has to come back out again as heat, which also takes energy to remove from the system and its controlled environment.
While raw performance and saving on floor space and electricity costs are important, so is being able to run applications, which is why IBM opted for 32-bit Linux as the main development environment, plus the open source GNU C, C++, and Fortran compilers. Mr Turek says that Blue Gene/L will not be suited for every application, of course, which is why LLNL has a giant Linux-Itanium cluster (currently number 2 on the Top 500 list, soon to be number 3) as well as the future ASCI Purple parallel Unix box. The processors, memory and I/O architectures, and switch interconnect schemes of these three machines are different, and suited to particular workloads. It is an untenable position to believe that one architecture can meet all needs, said Mr Turek. He added that Blue Gene/L will be suited to those kinds of workloads where having a large number of computing elements with a modest amount of interconnect is the main determinant of the computation.
This means molecular modeling or the integration and processing of sensor array data (one application where Blue Gene/L is being tested right now). Lintel clusters will still have a natural affinity for financial applications, and big SMP RISC/Unix clusters will still be preferred for design and simulation in automotive and aerospace design applications. Mr Kronstadt says that IBM has tested Blue Gene/L on some three dozen Linux supercomputer applications, and has found that they have scaled pretty well in most cases. LLNL will be using its Blue Gene/L super to do materials science research, Monte Carlo simulations, and various atomic-scale simulations, all of which will know exactly how to make use of the 65,536 processors in the full 64-rack Blue Gene/L machine when it is finally completed at LLNL around May 2005.