Will 2023 be the year of the AI lawsuit?

An AI-generated representation of a sandy desert with trees. Generative AI platforms like Midjourney and Stability AI are facing lawsuits from artists concerned that outputs such as this are trained on copyrighted works without permission. (Photo by Prompart / Pixexid)

Midjourney doesn’t understand fingers – strange, considering the generative AI’s knack for create weird and wonderful images from little more than a simple text prompt. Take, for example, the assembly line of stills from movies never made by famous directors, or more unhinged outputs featuring velociraptors posing with high school graduates at their annual prom, or ‘King of the Hill’ reimagined as a live-action 1980s sitcom. Digits, though, seem to be the most unimportant part of these pictures, often seven or eight to a hand or shaved at the knuckle, foregrounded by more dazzling and distracting visuals.

AI art lawsuit — An AI-generated representation of a sandy desert with trees. Generative AI platforms like Midjourney and Stability AI are facing lawsuits from artists concerned that outputs such as this are trained on copyrighted works without permission. (Photo by Prompart/Pixexid)

Real artists aren’t pleased – less by the finger crisis than the possibility that generative platforms like Midjourney, DALLE-2 and Stable Diffusion could lead to the debasement of their craft. Others, meanwhile, believe that these models are spitting out little more than cheap reformulations of existing, copyrighted work. Last month, three of those artists mounted a civil lawsuit in San Francisco and London against Midjourney and Stability AI, on the grounds that their models used their works in training databases without permission and that, consequently, any works produced were derivative. Stock photography service Getty, meanwhile, has sent a letter to both warning of future legal action, after several generations appeared containing ghostly reproductions of the platform’s famous watermark.

Suits like these, explains Cerys Wyn Davies, an intellectual property expert at Pinsent Masons, create immense challenges for regulators in considering the interests of the users of a new technology and those of the industry it’s set to transform. “You have to have a balance between the rights owner and freedom for people to innovate,” says Davies. “But, of course, giving that right to people – an intellectual property right – has always been there to encourage innovation.”

Generative AI lawsuits 101

Two main arguments are at play in the lawsuit against Midjourney and Stability AI: namely, that the firms infringed the copyright of the artists by using their images without permission, and that the inclusion of these images in the training data makes the outputs effectively derivative content. On the face of it, says Dr Bahne Sievers, the argument about training data is perfectly reasonable. Most jurisdictions do not allow anyone to use copyrighted data “unless you have a licence,” explains the intellectual property lawyer at FieldFisher. “But, nobody has a licence!”

There are exceptions to this rule. In the US, for example, there is a broad ‘fair use’ doctrine, which permits the use of copyrighted materials for the purposes of free expression or for a ‘transformative’ purpose. There’s much less leeway in European jurisdictions, however, with countries such as Germany and the UK only permitting specific scenarios where it is permissible to use copyrighted materials. That would excuse quoting a copyrighted work or painting a portrait in a similar style to another, say Sievers and Davies, but not necessarily hoovering up artistic images to train an AI model without permission of the rights holders.

“For Stability AI to be able to argue a broader fair dealing defence in the UK is probably going to be more difficult” than in the US, says Gill Dennis, an intellectual property law expert at Pinsent Masons, “because they’re not necessarily falling within any of the fair dealing exceptions specifically set out in the legislation.”

It’s also odd to some lawyers that generative AI firms are being sued and not those that compiled the dataset. In the case of Midjourney, that would be the large-scale Artificial Intelligence Open Network (LAION), based in Germany. “If LAION created the dataset, then the alleged infringement occurred at that point, not once the dataset was used to train the models,” Eliana Torres, an intellectual property lawyer with the law firm Nixon Peabody, told Tech Crunch last month. It’s also important to note, says Dr Andres Guadamuz, a reader in intellectual property law at the University of Sussex, that LAION doesn’t actually keep copyrighted images on file but only links to their original locations on the internet – which, he adds, is perfectly acceptable to mine under European and German law.

Any argument that the works produced by generative AI are in some way reproductions of original works is also hard to prove, says Guadamuz. Each image, after all, is broken down into mathematical abstractions, which are then used by the model to help chart the commonalities between the aspects of a given photo, painting or rendering and written text prompts. One might compare the process to an artist committing all the pictures in the world to memory to learn how to create something new, rather than incorporating fractional chunks of each image into their latest creations.

Not every artist baulks at this process. For his part, Mario Klingemann has used AI to create images for the better part of a decade, and wouldn’t especially mind if his works appeared in a generative AI dataset. Indeed, he worries that a victory for the artists in the Stability AI suit would stick a philosophical wrench into the development of new generative services. “If you set the precedent that learning from material is something you have to ask for permission [to do],” says Klingemann, “then why does this only apply to machines, and not to humans?”

Future balances

Getty may be on stronger legal ground with its suit. There, Guadamuz suspects that the firm will argue that the inclusion of their images in AI training datasets is a simple violation of its website’s terms of service, an oversight that could be remedied in future with a simple licencing agreement between the firm and generative AI services like Midjourney. But even then, he adds, “I’m not sure if that’s going to fly, because you have to say that the robots have the capacity to enter into a contract.”

Until verdicts are reached in these suits, explains Sievers, generative AI firms are exposing themselves to significant legal risks across multiple jurisdictions. One might imagine that exposure being limited by not running that service in a country where that risk is more pronounced, or excluding works by persons of that nationality. But, says Sievers, while “it’s easy if you just limit your AI to five artists, when you say that you just exclude Germans, I think technically that would be a hell of a lot of work.”

Regulators, meanwhile, seem to have been caught by surprise by the legal ramifications of generative AI. In the UK, for example, a sweeping consultation by the Intellectual Property Office on how copyright law should be amended to foster innovation in AI ended just before the emergence of ChatGPT. While it seemed poised to recommend allowing less discriminate data mining for commercial purposes, intense lobbying from the creative industries seems to have halted that move for now. As a result, says Davies, “they told us that it was likely to be watered down.”

One solution to complaints of copyright infringement in AI text and data mining may be the creation of automated filters, some of which have already emerged. GitHub, for example, rolled out a filter in 2021 that (mostly) prevented Copilot from suggesting anything that was identical or substantially similar to its publicly available code, though this didn’t stop another lawsuit targeting Microsoft, GitHub and OpenAI for facilitating copyright infringements being filed in November. Stability AI, too, has promised to include an opt-out provision for artists in its next iteration of Stable Diffusion. Such workarounds are set to become more common, says Guadamuz, as jurisdictions compete with each other to allow generative AI to gin up productivity in their respective economies.

What that could mean for the future of copyright law is hard to say. History shows, however, that its governing principles can resist the mild panic that accompanies the introduction of new technology. “Book printing was the first time where you needed a copyright law,” says Sievers, with photography becoming the next great controversy for lawyers to solve. Over time, though, a new consensus emerged over where and when it was fair to use protected materials that accommodated widespread use of the new technology – a process, the lawyer argues, is just as likely to happen again.

In the meantime, though, we may see a slowdown in new generative AI applications while these lawsuits play out. “I know that the music models are as good, if not better, than the art models,” says Guadamuz, but developers are reluctant to commercialise their services for fear of courting litigation from record labels. Google is one. The search giant has proven reluctant to make its MusicLM model open source, partly because its developers discovered that 1% of its outputs were direct replications of the songs from its training data.

It’s unlikely, however, that the current wave of AI lawsuits will see releases grind to a halt. Now that the general principles of building such generative applications from larger foundation models have been firmly established, the future of the field is no longer contingent on the existence of players such as Midjourney, Stability AI or even OpenAI. “They could get sued out of existence,” says Guadamuz, “and artificial intelligence would continue happening as it is right now.”