Google I/O: New Google Cloud machine learning cluster is 'world's fastest'

FRANKFURT AM MAIN, GERMANY – SEPTEMBER 11: People walk past a Google Cloud exhibit during the press days at the 2019 IAA Frankfurt Auto Show on September 11, 2019 in Frankfurt am Main, Germany. The IAA will be open to the public from September 12 through 22. (Photo by Sean Gallup/Getty Images)

Google Cloud has unveiled what it describes as the “world’s largest publicly available machine learning mega cluster”, which will deliver nine exaflops of compute power to users of Google Cloud Platform. Artificial intelligence workloads are increasingly important for many cloud users, and Google Cloud will be hoping its new cluster will prove appealing to customers.

Google Cloud is launching what it says is the world’s most powerful machine learning cluster. (Photo by Sean Gallup/Getty Images)

Revealed as part of the company’s Google I/O developer conference, the machine learning cluster will be powered by the latest v4 version of Google’s in-house Tensor Processing Units, which is designed to run its cloud services as well as other platforms such as YouTube.

Google Cloud’s machine learning cluster

“Google Cloud’s ML cluster enables researchers and developers to make breakthroughs at the forefront of AI, allowing them to train increasingly sophisticated models to power workloads such as large-scale natural language processing (NLP), recommendation systems, and computer vision algorithms,” Sachin Gupta, vice president and general manager for infrastructure and Max Sapozhnikov, product manager for cloud TPU said in a joint statement. “At nine exaflops of peak aggregate performance, we believe our cluster of Cloud TPU v4 Pods is the world’s largest publicly available ML hub in terms of cumulative computing power.”

Based at one of the company’s data centres in Oklahoma, the ML cluster is powered by a series of Cloud TPU v4 pods, each of which consists of 4,096 chips connected in an ultra-fast interconnected network. Google says each pod has “industry leading” bandwidth of six terabits per second, enabling it to rapidly digest information and train large AI models. The Oklahoma data centre operates on 90% renewable energy.

The machine learning cluster is available in preview from today.

Why AI and machine learning are important to cloud providers

Increasing numbers of businesses are turning to AI and ML to help improve efficiency and digitise their operations. According to a McKinsey study, ‘The State of AI in 2021’, published last year, 56% of more than 1,800 organisations polled around the world said they had adopted AI in at least one function of their business, up from 50% in 2020.

The role of cloud computing in successful AI adoption was also highlighted in the McKinsey study, which identifies companies it describes as “AI high performers” – businesses which attribute at least 20% of their earnings to their AI implementation. This high-performance group run an average of 64% of their AI workloads in the cloud, compared to 44% for other respondents, suggesting cloud-based AI can offer better returns than on-premise systems.

The high-performing group “is also accessing a wider range of AI capabilities and techniques on a public cloud,” the McKinsey study says. “For example, they are twice as likely as the rest to say they tap the cloud for natural-language-speech understanding and facial-recognition capabilities.”

AR, VR and database innovations at Google I/O

Google Cloud made several other announcements at I/O, including the launch of AlloyDB, a new PostgreSQL-compatible database service for highly demanding workloads. It claims this offers double the processing of the comparable service for transactional workloads offered by Amazon’s AWS, the world’s leading cloud platform.

Also available to Google Cloud Platform users from today is Immersive Stream, a new service that renders immersive 3D and augmented reality experiences and allows them to be streamed to mobile devices.

For security, Google Cloud has Network Analyzer, a new module in the platform’s Network Intelligence Center to enable developers to detect network failures and prevent downtime by pinpointing potential problems such as accidental misconfigurations and over-utilisation of services.