Microsoft says Google is tying up content deals to train AI models

Microsoft has a significant stake in AI Lab, OpenAI which built ChatGPT (Photo: Ascannio / Shutterstock)

Big Tech companies are in a growing battle to license content to train next-generation AI models, Microsoft CEO Satya Nadella has warned. Testifying in a US government antitrust case against Google, he said the search giant was tying up content deals with publishers and making it harder for other companies to compete.

Nadella said it was “Google’s web” and everyone else was just playing in it. He suggested that “there’s a new avenue to lock up – the thing that basically feeds the power of these LLMs, which is content.” He warned the search giant was in the early stages of deals to buy up content libraries for large language models, including its own YouTube platform.

Future foundation AI models, including Google’s upcoming Gemini, Microsoft-backed OpenAI’s GPT-5 and Claude 3 from Amazon-backed Anthropic are likely to be fully multi-modal. This means they will require training data across images, video, audio and text.

Nadella told the hearing that Google’s efforts to build content libraries through deals and agreements with other providers for its AI models reminded him of distribution agreements in the early days of the web. These deals included paying smartphone manufacturers like Apple billions of dollars to make Google the default search engine. This is also at the heart of the antitrust case currently taking place in Washington DC.

Nadella says the clout that Google has in the search market and through these agreements, as well as its position as a major advertising broker, makes it easier for the search giant to convince companies to sign exclusive agreements over content.

AI’s need for data and compute power

The Microsoft CEO says that building AI models requires multiple components, including extensive computing power, algorithms and data to train the underlying models. While it has significant computing power through its Azure cloud platform, data can be harder to access. It needs to be licenced and authorised to meet regulatory requirements.

Microsoft says it is happy to put the money into training these models, which can run to billions of dollars. But it could become problematic if certain companies lock up exclusive deals with the big content makers, including book publishers, social media platforms and news organisations. “When I am meeting with publishers now, they say Google’s going to write this check and it’s exclusive and you have to match it,” he said.

An early indicator of this is companies like Getty and Shutterstock signing licencing deals for their image libraries to companies like Adobe and OpenAI. A report in the FT earlier this year revealed that News Corp, Axel Springer, the New York Times and the Guardian had met with the leading AI labs over licencing. The FT is also in discussions with the AI Labs over licencing its own content for training future models.

“Copyright is a crucial issue for all publishers,” the FT said in a statement. “As a subscription business, we need to protect the value of our journalism and our business model. Engaging in constructive dialogue with the relevant companies, as we are, is the best way to achieve that.”

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

AI’s need for data and compute power

Read more: Guidance issued for public sector AI use as Whitehall trials chatbots

Sign up for our regular news round-up!

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing