OpenAI is reportedly offering publishers up to $5m to license news content to train its large language models (LLMs), with Apple also reportedly engaged in similar talks. The news comes a week after the New York Times announced that it was suing OpenAI for copyright infringement, alleging that the AI company had used its articles to train its LLMs without its permission.
AI developers were criticised throughout 2023 for using image and text data to train their models without considering whether or not it was copyrighted. Most of it is sourced from information scraped indiscriminately from the internet, whether through purpose-built web crawlers or open-source data providers like LAION, before it is vetted and curated. The extent to which this curation process includes the removal of copyrighted data remains unknown, though the suspicion that it doesn’t led major news organisations including CNN, Reuters and the New York Times to block OpenAI’s web crawler from their websites in August 2023.
This was followed by the announcement last week by the Grey Lady that it would be suing OpenAI for copyright infringement, alleging that its LLMs were “built by copying and using millions of The Times’ copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to guides, and more.” At the time, an OpenAI spokesperson told the NYT that it respected “the rights of content creators and owners and [is] committed to working with them to ensure they benefit from AI technology and new revenue models”.
OpenAI’s decision to enter into licensing negotiations with major media brands, first reported by The Information, could be seen as one way it could avoid similar lawsuits in the future. The AI lab has already penned deals with the Associated Press and Axel Springer, and the agreement with the latter will allow users of OpenAI’s ChatGPT service to receive summaries of content from Axel Springer’s news sites and see the model answer queries with attribution to articles in the publisher’s archive. The financial terms of these two deals remain unknown. However, according to The Information, OpenAI is offering media firms between $1m–5m to license their content. Apple, meanwhile, is reportedly offering higher remuneration but also demanding more extensive usage rights over news content.
OpenAI also faces an increasingly uncertain regulatory environment when it comes to questions of copyright infringement. In April the EU said that LLM developers must declare when copyrighted content is used to train their models. However, other jurisdictions like the UK and Japan have declared their interest in allowing copyrighted data to train commercial AI models (though consultations remain ongoing in the latter). In the meantime, several Big Tech businesses including Microsoft, Google and Adobe have offered to indemnify any customers against copyright claims arising from the use of their AI products.
Tech Monitor has contacted Apple and OpenAI for comment.