OpenAI fixes ChatGPT bug that may have breached GDPR

ChatGPT generates titles automatically for each chat session that can be adapted by the user (Photo: Ascannio / Shutterstock)

OpenAI could be in breach of GDPR legislation after the titles assigned to users’ ChatGPT conversations were randomly exposed to other users without consent. The company described it as a “significant issue” with a third-party open-source library that has since been fixed. A legal expert said that any action would depend on the level of harm caused by the titles appearing in the account of another user, and what that information includes.

Co-founder and CEO Sam Altman disclosed the problem on Twitter, saying: “we feel awful about this”.

we had a significant issue in ChatGPT due to a bug in an open source library, for which a fix has now been released and we have just finished validating.

a small percentage of users were able to see the titles of other users’ conversation history.

we feel awful about this.
— Sam Altman (@sama) March 22, 2023

In ChatGPT, when a new conversation with the chatbot is started a note is created in the sidebar and as the conversation goes on this is given an AI-generated title. The text can be changed by the user or the note deleted. A small group of users were shown other users’ titles by mistake.

Since its launch in November 2022, ChatGPT has become one of the fastest-growing consumer apps in history, hitting 100 million unique monthly users in January alone. It has sparked a flurry of activity with companies like Microsoft, a major investor in OpenAI, and Google launching their own chatbots and integrating generative AI tools into products.

It has also sparked calls for regulation and clarity on where the technology falls within legislation such as GDPR and the upcoming EU AI Act. ChatGPT is built on top of OpenAIs GPT-4 multi-modal large language model which was trained on data scraped from the internet, massive datasets from the likes of Wikipedia and law libraries and other information not disclosed by the company.

Altman says there will be a “technical postmortem” into what caused the glitch and information used in prompts and responses may be used in training the model but only after personally identifiable information has been removed.

Need for regulation of AI

Countries around the world are actively exploring the impact of this type of phenomenon and how to regulate for it and ensure user data is protected. The UK is also working on a new task force to examine the impact of large language models on society, the economy and individuals.

Lillian Edwards, professor of law at Newcastle University, says the Information Commissioners Office (ICO) may examine the type of breach experienced by OpenAI to see if UK data was exposed. In the event of a breach, the regulator will most likely ask the company to ensure it doesn’t happen again rather than take any action. Tech Monitor has asked the ICO for comment.

Caroline Carruthers, CEO and co-founder of Carruthers and Jackson, says protecting user data was a core requirement of any organisation, particularly a data-rich organisation like OpenAI and breaches such as this could erode confidence in its business. Worse, she said, it also highlights the potential data pitfalls of AI.

“Platforms like ChatGPT rely on user data to function, but acquiring that data means users have to be able to trust that their information will be secure,” Carruthers says. “This should serve as a lesson to be learned to other businesses looking to utilise AI: you need to get your data governance basics right before you can graduate on to AI and ML.”

Ali Vaziri, legal director in the data and privacy team at Lewis Silkin said the issue with the AI titles being shared with other users and whether it is a data protection issue depends on whether the original user can be identified from the titles alone. “If the only information available to those other users are the conversation history titles, unless the titles themselves contain information from which the original user can be identified, it probably won’t be a personal data breach as far as a loss of confidentiality is concerned.”

Even if the titles were to contain personally identifiable information, whether it becomes a regulatory issue would depend on the level of harm. “If harm to users is likely, then that will be the trigger for any regulatory notifications which might need to be made,” said Vaziri.

“However, data protection law also requires controllers to ensure the accuracy of personal data they process, so displaying the wrong conversation history titles to a user might amount to a breach of that principle; and since doing so may have affected the integrity of personal data in that user’s account, the incident might constitute a personal data breach on that basis,” he added.

Data privacy and control

Vlad Tushkanov, lead data Scientist at Kaspersky told Tech Monitor users should have had “zero expectation of privacy” as OpenAI warns that any conversation could be viewed by AI trainers and urges users not to share any sensitive information in conversations. He urged users to “treat any interaction with a chatbot (or any other service, for that matter) as a conversation with a complete stranger: you don’t know where the content will end up, so refrain from revealing any personal or sensitive information about yourself or other people.”

Despite the warnings, some users have responded to Altman on Twitter claiming they had titles that included personal and “highly sensitive” information. The bigger issue, says Edwards, is the potential for sensitive information scraped from the internet to leak out in responses.

“It is well known these models leak personal data like sieves,” she warned, adding that “their training datasets contained infinite amounts of personal and often sensitive data and it may emerge randomly in response to a prompt at any point.”