Whether you are a business or a major government organisation, if you want to crunch a lot of data and do a lot of heavy compute, at some point you need to work your way down to the very bottom of your stack and assess the hardware powering it. That means peering down to the silicon level, and choosing your chips right.
The US’s Department of Energy (DoE) on Wednesday opted for architectures from a resurgent AMD, the Santa Clara-based semiconductor firm, for both CPUs and GPUs in what will be the world’s fastest supercomputer, “El Capitan” in arguably the most high profile contract won by the firm yet – with HPE also central to the project.
El Capitan will, stunningly, be more powerful than the Top 200 fastest supercomputers in the world combined, with tests showing that its two exaflop performance is 30 percent higher than initially projected estimates calculated seven months ago. That equates to two quintillion calculations per second.
(Missing from the contract announcement meanwhile, potential contributors IBM, Intel, and NVIDIA. Intel CFO George Davis late Wednesday, meanwhile, admitted despite being “undoubtedly in the 10nm era,” the company felt that it would not reach process parity with “competitors” until it produces the 7nm node at the tail end of 2021).
Not a surprise, but considering where AMD was a couple of years go, this is a remarkable turnaround.
— HPC Guru (@HPC_Guru) March 4, 2020
Bronis de Supinski, CTO of the Lawrence Livermore National Laboratory (LLNL) division that architects DoE supercomputers, said LLNL “uses best value procurements, and our decision was based on evaluating the options that were available in the timeframe that we needed… There were others”, he added, “[but] based on the performance that we expect the AMD processors to deliver to our actual workload, our decision was that they would provide by far the best value to the government.”
The contract was the latest indication that AMD has real wind in its sales. It comes 10 days after internet architecture firm Cloudflare announced that it was opting for AMD CPUs over Intel XEON architectures across its sites in 200 cities globally.
El Capitan will conduct nuclear modelling/national security work for the Lawrence Livermore National Laboratory (LLNL) and be live by 2023. Among its tasks: running simulations of nuclear weapon explosions “in the absence of underground testing” and supporting “stockpile stewardship applications”.
LLNL said the machine will also perform “secondary national security missions, including nuclear nonproliferation and counterterrorism.”
El Capitan will be powered by next-generation “Zen 4” AMD EPYC processors, code-named “Genoa”, AMD Radeon Instinct GPUs based on a new architecture optimised for High Performance Computing (HPC) and AI workloads, and the AMD Radeon Open Compute platform (ROCm) heterogenous computing software, the DoE said.
HPE and AMD jointly designed new technologies to support the deal.
- Streamlined communication between HPE’s Cray Slingshot interconnect, a specialized HPC networking solution, and the new Radeon Instinct GPUs.
- High density compute blades powered by “Genoa” CPUs.
- A new approach using accelerator-centric compute blades (in a 4:1 GPU to CPU ratio, connected by the AMD Infinity Architecture that help offload processing from the CPU to the GPU for particularly computational tasks.
Other performance enhancements include use of flash-based local storage systems, designed specifically for the new system’s performance needs, will provide a buffer to balance existing on-board memory and data-tiering to automate data movement.
The DoE confirmed El Capitan will be reserved for classified U.S. interests, but with a smaller “clone” system, still more powerful than Summit supercomputer (the current #Top500 supercomputers #1 ranking machine will be available for general science.