According to the latest list, which was released yesterday at the International Supercomputer Conference in Heidelberg, Germany, 242 of the most powerful supercomputers on the Top 500 list have at least 1 teraflops of computing power.
While the Top 500 list, which is based on Linpack Fortran benchmarks, is by no means the only or the best way of measuring the performance of supercomputers, the Linpack benchmark was the first widely accepted means of gauging the power of workstations, clusters, vector machines, parallel supercomputers and a whole host of hybrid architectures. As such, the test has brought a certain amount of order to the supercomputing field, and it has arguably help set and push various trends in computing. It provides a snapshot of the cutting-edge of computing. The Linpack benchmark was created decades ago by Jack Dongarra of the University of Tennessee, and the Top 500 list is compiled by Dongarra as well as Hans Meuer of the University of Mannheim, Germany and Erich Strohmaier and Horst Simon of NERSC/Lawrence Berkeley National Laboratory.
The current list has Japan’s Earth Simulator, a massively parallel machine built from NEC’s vector supercomputers, still ranked as the most powerful machine in the world with a performance rating of 35.8 teraflops on the Linpack test. Earth Simulator has led the list for three years, and only six months ago this machine by itself accounted for 6.7% of the aggregate 528 teraflops in the list. But the proliferation of new architectures and cheaper clusters is allowing machines in the same power class as Earth Simulator to be built relatively easily.
For instance, number two on the list is the Thunder cluster at Lawrence Livermore National Lab, which has 1,024 four-way Itanium 2 servers linked by Quadrics interconnection that delivers 19.9 teraflops of processing power. Earth Simulator has 5,120 vector processors compared to Thunder’s 4,096, so the vector approach still yields efficiencies. The choice of architecture – parallel vector, clustered x86, or other options – is dependent on national interests and the code base the buyer has to support, not on any mythical right answer. No supercomputer can run all code well.
That’s why the two BlueGene/L prototypes that also enter the Top 500 list at number four and eight are important, too. BlueGene/L embodies a different approach to building a Linux supercluster than just slapping together a bunch of x86 iron with fast (yet commodity) interconnection. The two prototype BlueGene/L machines installed at IBM’s own research center are based on a stripped-down 32-bit PowerPC 440 processor and a minimalist Linux kernel. Clock speeds are low and more in synch with I/O (integrated on the chip) and memory speeds. While RISC and x86 processors are hitting the 2GHz to 3GHz range, the BlueGene/L DD1 machine has processors that run at 500MHz, while the DD2 machine uses faster processors running at 700MHz. The BlueGene/L DD1 machine is rated at 11.7 teraflops with 8,192 processors that are tightly packed because they have such low clock speeds; this machine will in 2005 be boosted to about 64 peak teraflops of power as it is installed at LLNL.
While Big Blue is playing around with BlueGene/L, it is still selling a lot of Power and x86 clusters and dominates the list with 407 teraflops of aggregate capacity (out of a total of 813 teraflops), with 224 systems (45%), 68 of the Top 100 machines, and the greatest number of Linux clusters (150). Hewlett Packard Co is the number two on the Top 500 list with 140 systems and 150 teraflops of aggregate computing capacity. Every other vendor on the list has a very small slice of the market, and no other player has more than 6% of systems or teraflops.
In terms of architecture, 287 out of the 500 machines on the list are based on extended x86 architectures; six months ago, that number was 189, and a year ago it was only 119. IBM has sold 75 systems on the list with one of its variants of the Power processor; another 57 machines are based on HP’s PA-RISC processors (mostly HyperPlexed Superdome machines), while another 34 machines are based on Advanced Micro Devices’ Opteron processors. In fact, the number ten machine is an Opteron cluster at the Shanghai Supercomputer Center that marks the first entry of China and Opteron in the Top 10. That machine is based on 2,560 2.2GHz Opteron 248 processors, and it has been rated at 8.1 teraflops on the Linpack test. While the line between clusters and single system image machines (like Cray X1s) is a fuzzy one, the computer scientists behind the Top 500 list say that 291 of the machines (58%) on the list are clustered machines.
This Top 500 list is interesting for another reason–more and more companies are requesting to not be identified by name on the list next to the feeds and speeds of their supercomputers, much as the spook arms of the governments of the world have been asking for years as their machines are listed. Such censoring is a bad idea in a free society, but given the strategic nature of computing capacity for oil companies and digital media companies, the desire to keep everything secret is understandable even if it is wrong. A list like the Top 500 should go out of its way to make sure such information is available, not hide it. This sort of thing can get way out of hand.