View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Cloud
April 6, 2023updated 18 Oct 2023 11:40pm

Why Nvidia won’t be worried by Google’s AI supercomputer breakthrough

Google says it can trump Nvidia when it comes to AI supercomputing prowess. But the news is unlikely to trouble the market leader.

By Matthew Gooding

Google claimed a breakthrough in processor speed this week when it released research showing an AI supercomputer powered by its in-house tensor processing units (TPUs) offers improved performance and better energy efficiency than an equivalent machine running on Nvidia A100 GPUs. Nvidia has cashed in on the generative AI boom, with demand for the A100, the chip used to train large language AI models like OpenAI’s GPT-4, going through the roof. But with a new GPU, the H100, ready to hit the market, it is unlikely to be worried by Google’s achievement.

Google’s supercomputer in Oklahoma, powered by TPU V4. (Photo by Google Cloud)

The research paper, published on Tuesday, shows that Google has strung together 4,000 of its fourth-generation TPUs to make a supercomputer. It says the machine is 1.7 times faster than an equivalent machine running on Nvidia A100 GPUs, and 1.9 times more efficient.

Why Google’s TPUs are more efficient than Nvidia A100

In the scientific paper, Google’s researchers explain how they connected the 4,000 TPUs using optical circuit switches developed in-house. Google has been using TPU v4 in its own systems since 2020, and made the chips available to customers of its Google Cloud platform last year. The company’s biggest LLM, PaLM, was trained using two 4,000 TPU supercomputers.

“Circuit switching makes it easy to route around failed components,” Google fellow Norm Jouppi and Google distinguished engineer David Patterson explained in a blog post about the system. “This flexibility even allows us to change the topology of the supercomputer interconnect to accelerate the performance of a machine learning model.”

This switching system was key to helping Google achieve a performance bump, says Mike Orme, who covers the semiconductor market for GlobalData. “Although each TPU didn’t match the processing speed of the best Nvidia AI chips, Google’s optical switching technology for connecting the chips and passing the data between them made up the performance difference and more,” he explains.

Nvidia’s technology has become the gold standard for training AI models, with Big Tech companies buying thousands of A100s as they attempt to outdo each other in the AI arms race. The OpenAI supercomputer used to train GPT-4 features 10,000 of the Nvidia GPUs, which retail at $10,000 each.

But the A100 is about to be usurped by the company’s latest model, the H100. The recently launched chip topped the pile for power and efficiency in inference benchmarking tests released today by MLPerf, an open AI engineering consortium which tracks processor performance. Inference is the speed at which an AI system can carry out a task once it is trained.

Content from our partners
DTX Manchester welcomes leading tech talent from across the region and beyond
The hidden complexities of deploying AI in your business
When it comes to AI, remember not every problem is a nail

“Nvidia claims [the H100] is nine times faster than the A100s involved in the Google comparison,” Orme says. “That speed premium would eliminate the edge Google’s optical interconnect technology provides. The battle of odious comparisons intensifies.”

What are Google’s ambitions in AI chips?

Google says it uses TPUs for 90% of its AI work, but despite the capabilities of the chips, Orme does not expect the tech giant to market them to third parties.

“There is no ambition on Google’s part to compete with Nvidia chips in the merchant market for AI chips,” he says. “The  proprietary TPUs will not make it out of the Google data centre or its AI supercomputers if they were ever intended to do so.”

He adds that very few people outside the company will get to utilise the technology, as Google Cloud is a relatively minor player in the public cloud market. It holds 11% of the market according to figures from Synergy Research Group, trailing in the wake of its hyperscaler rivals Amazon’s AWS and Microsoft Azure, which have 34% and 21% shares respectively.

Google has also done a deal with Nvidia which will see the H100 made available to Google Cloud customers, and Orme says this reflects the fact that Nvidia’s place as the market leader will remain secure for some time to come.

“Nvidia is likely to remain the AI chip kingpin in a market that will reflect the feverish excitement surrounding generative AI as spending soars on training and inference capacity,” he adds.

Read more: IBM has built Vela, a cloud-native supercomputer

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU