OpenAI has lifted the lid on how it builds safety measures into its AI tools. The company, developer of the powerful GPT-4 large language AI model, which underpins its ChatGPT chatbot and myriad other AI systems, has made the disclosure as calls grow for more controls to be imposed on the development of generative AI systems.
In a blog post yesterday, OpenAI detailed what it is doing to stop its systems generating harmful content and violating data laws. While the company’s tools have spearheaded a generative AI boom around the world, recent weeks have seen regulators start to take an interest in the systems, with Italy having banned ChatGPT over potential GDPR violations.
Last week, AI experts including Elon Musk and Apple co-founder Steve Wozniak signed a letter calling for a temporary pause on the development of LLMs, and on Wednesday US president Joe Biden weighed in, telling reporters that AI companies must put safety first.
How OpenAI built GPT-4 with safety in mind
Before releasing GPT-4, its most advanced model to date, last month, OpenAI says it spent six months refining the system to make it as difficult as possible for it to be used for nefarious purposes.
Security researchers have previously demonstrated that it is possible to circumnavigate security controls on ChatGPT by “tricking” the chatbot into impersonating a bad AI that generates hate speech or code for malware that can be used by cybercriminals. OpenAI says this is less likely to occur with GPT-4 than its predecessor, the GPT-3.5 model.
“GPT-4 is 82% less likely to respond to requests for disallowed content compared to GPT-3.5 and we have established a robust system to monitor for abuse,” the company’s engineers said.
“We are also working on features that will allow developers to set stricter standards for model outputs to better support developers and users who want such functionality.”
Do OpenAI’s systems expose personal data of internet users?
The company also took the opportunity to respond to concerns of data regulators about the way its models harvest data from the internet for training purposes. Following Italy’s ChatGPT ban, Canada has launched an investigation into the chatbot and other European countries are considering whether to follow Italy’s lead.
OpenAI said: “Our large language models are trained on a broad corpus of text that includes publicly available content, licensed content, and content generated by human reviewers. We don’t use data for selling our services, advertising, or building profiles of people.”
It also detailed how it ensures personal data isn’t exposed during training. “While some of our training data includes personal information that is available on the public internet, we want our models to learn about the world, not private individuals,” the company said.
“So we work to remove personal information from the training dataset where feasible, fine-tune models to reject requests for personal information of private individuals, and respond to requests from individuals to delete their personal information from our systems. These steps minimise the possibility that our models might generate responses that include the personal information of private individuals.”
President Biden joins AI naysayers
OpenAI’s statement comes after President Biden told reporters that AI developers “have a responsibility to make sure their products are safe before making them public”.
Biden was speaking after meeting with his Council of Advisors on Science and Technology to discuss developments in AI. He said his administration was committed to advancing the AI Bill of Rights, introduced last October to protect individuals from the negative effects of advanced automated systems.
“Last October, we proposed a bill of rights to ensure the important protections are built into the AI systems from the start, so we don’t have to go back to do it,” Biden said. “I look forward to today’s discussion about ensuring responsible innovation and appropriate guardrails to protect America’s rights and safety, and protecting their privacy, and to address the bias and disinformation that is possible as well.”