ChatGPT: Will compute power become bottleneck to AI growth?

OpenAI has been forced to introduce a queuing system and other traffic shaping measure due to demand for ChatGPT (Photo: Tada Images/Shutterstock)

In less than a week, OpenAI’s first chatbot tool ChatGPT went viral, with billions of requests being made to put the much-hyped system through its paces. Interest was so high that the company had to implement traffic management tools including a queuing system and slowing down queries in order to cope with demand, and the incident highlights the vast amounts of compute power required to sustain large language models like GPT-3, the system on which ChatGPT is built.

OpenAI has been forced to introduce a queuing system and other traffic shaping measure due to demand for ChatGPT — OpenAI has been forced to introduce a queuing system and other traffic shaping measures due to demand for ChatGPT

As this and other types of advanced AI become more commonplace and are put to use by businesses and consumers, the challenge will be to maintain sufficient compute capacity to support them. But this is easier said than done, and one expert told Tech Monitor a bottleneck is already being created by a lack of compute power holding back AI development. Turning to supercomputers, or entire new hardware architectures, could be potential solutions

The scale of compute power required to run ChatGPT

A large language model such as GPT-3 requires a significant amount of energy and computing power for its initial training. This is in part due to the limited memory capacity of even the largest GPUs used to train the systems, requiring multiple processors to be running in parallel.

Even querying a model using ChatGPT requires multi-core CPUs if done in real-time. This has led to processing power becoming a major barrier limiting how advanced an AI model can become.

GPT-3 is one of the largest ever created with 175bn parameters and, according to a research paper by Nvidia and Microsoft Research “even if we are able to fit the model in a single GPU, the high number of compute operations required can result in unrealistically long training times” with GPT-3 taking an estimated 288 years on a single V100 Nvidia GPU.

Using processors running in parallel is the most common solution to speed things up but it has its limitations, as beyond a certain number of GPUS the per-GPU batch size becomes too small and increasing numbers further becomes less viable while increasing costs.

Hardware has already become a bottleneck for AI

Professor Mark Parsons, director of EPCC, the supercomputing centre at the University of Edinburgh told Tech Monitor a realistic limit is about 1,000 GPUs and the most viable way to handle that is through a dedicated AI supercomputer. The problem, he said, is even if the GPUs can become faster the bottleneck will still exist as the interconnectors between GPUs and between systems isn’t fast enough.

“Hardware has already become a bottleneck for AI,” he declared. “After you have trained a subset of data on one of the GPUs you have to bring the data back, share it out and do another training session on all GPUs which takes huge amounts of network bandwidth and work off GPUs.”

“GPT and other large language models are being continuously developed and some of the shortcomings in training in parallel are being solved,” Parsons adds. “I think the big challenge is a supercomputing challenge which is how we improve data transfer between GPU servers. This isn’t a new problem and one that we’ve had in supercomputing for some time, but now AI developers are turning to supercomputers they are realising this issue”

He isn’t sure how quickly the speed of interconnects will be able to catch up as the fastest in the works have a throughput of about 800 Gbps which “is not fast enough today”.

“Computer networking speeds are improving but they are not increasing at the speed AI people want them to as the models are growing at a faster rate than the speed is increasing,” he says. “All people selling high-performance interconnects have roadmaps, have done designs and know where we are going in next five years – but I don’t know if the proposed 800Gbps will be enough to solve this problem as the models are coming with trillions upon trillions of parameters.”

He said it won’t be a major problem as long as the AI developers continue to improve the efficiency of their algorithms, if they don’t manage to do that then there “will be a serious problem” and delays until the hardware can catch up with the demands of the software.

Will new architectures be needed to cope with AI?

OpenAI’s upcoming large language model, GPT-4, is due to be released next. While rumoured to be an order of magnitude larger than GPT-3 in terms of power, is also thought to be aiming to deliver this increased ability for the same server load.

Mirco Musolesi, professor of computer science at University College London told Tech Monitor said developing large language models further will require improved software and better infrastructure. A combination of the two, plus hardware not yet developed, will end the bottleneck, he believes.

“The revolution is also architectural, since the key problem is the distribution of computation in clusters and farms of computational units in the most efficient way,” Professor Musolesi says. “This should also be cost-effective in terms of power consumption and maintenance as well.

“With the current models, the need for large-scale architectures will stay there. We will need some algorithmic breakthroughs, possibly around model approximation and compression for very large models. I believe there is some serious work to be done there.”

The problem, he explained, is that AI isn’t well-served by current computing architectures as they require certain types of computations, including tensor operations, that require specialist systems and the current supercomputers tend to be more general purpose.

New AI supercomputers, such as the ones in development by Meta, Microsoft and Nvidia, will solve some of these problems “but this is only one aspect of the problem,” said Musolesi. “Since the models do not fit on a single computing unit, there is the need of building parallel architectures supporting this type of specialised operations in a distributed and fault-tolerant way. The future will be probably about scaling the models further and, probably, the “Holy Grail” will be about “lossless compression” of these very large models”.

This will come at a huge cost and to reach the “millisecond” speed a search engine can deliver thousands of results, AI hardware and software will “require substantial further investment”.

He says new approaches will emerge including through new mathematical models requiring additional types of operations not yet known, although Musolesi added that “current investments will also steer the development of these future models, which might be designed in order to maximise the utilisation of the computational infrastructures currently under development – at least in the short term”.