Google’s artificial intelligence research lab DeepMind has created a framework for detecting potential hazards in an AI  model before it becomes a problem. This “early warning system” could be used to determine the threat risk if deployed.  It comes as G7 leaders prepare to meet to discuss AI’s impact and OpenAI promises $100,000 grants to organisations working on AI governance. 

Artificial intelligence models could have the ability to source weapons and mount cyberattacks, warns DeepMind. (Photo by T. Schneider/Shutterstock)

UK-based DeepMind recently became more closely integrated with parent company Google. It has been at the forefront of artificial intelligence research and is one of a handful of companies working towards creating human-level artificial general intelligence (AGI).

The team from DeepMind worked on a new threat detection framework with researchers from academia and other major AI companies such as OpenAI and Anthropic. “To pioneer responsibly at the cutting edge of artificial intelligence research, we must identify new capabilities and novel risks in our AI systems as early as possible,” DeepMind engineers declared in a technical blog on the new framework.

There are already evaluation tools in place to check powerful general-purpose models against specific risks. These benchmarks identify unwanted behaviours in AI systems before they are made widely available to the public. This includes looking for misleading statements, biased decisions or directly repeating copyrighted content. 

The problem comes from ever-more advanced models that have capabilities that go beyond simple generation. This includes strong skills in manipulation, deception, cyber offence, or other dangerous capabilities. The new framework has been described as an “early warning system” that can be used to mitigate those risks.

DeepMind researchers say the evaluation outcomes can be embedded in governance to reduce risk (Photo: DeepMind)
DeepMind researchers say the evaluation outcomes can be embedded in governance to reduce risk. (Photo courtesy of DeepMind)

Deep Mind researchers say responsible AI developers need to look beyond just the current risks and anticipate what risks might appear in the future as the models get better at thinking for themselves. “After continued progress, future general-purpose models may learn a variety of dangerous capabilities by default,” they wrote. 

While uncertain, the team say a future AI system that isn’t properly aligned with human interests may be able to conduct offensive cyber operations, skilfully deceive humans in dialogue, manipulate humans into carrying out harmful actions, design or acquire weapons, fine-tune and operate other high-risk AI systems on cloud computing platforms.

Moves to improve AI governance

They may also be able to assist humans in performing these tasks, increasing the risk of terrorists accessing material and content not previously accessible to them. “Model evaluation helps us identify these risks ahead of time,” the DeepMind blog says.

The model evaluations proposed in the framework could be used to uncover when a certain model has “dangerous capabilities” that could be used to threaten, exert or evade. It would also allow developers to determine to what extent the model is prone to applying this capability to cause harm – also known as its alignment. “Alignment evaluations should confirm that the model behaves as intended even across a very wide range of scenarios, and, where possible, should examine the model’s internal workings,” the team writes.

These results could then be used to understand the level of risk and what the ingredients are that have led to that level of risk. “The AI community should treat an AI system as highly dangerous if it has a capability profile sufficient to cause extreme harm, assuming it’s misused or poorly aligned,” the researchers warned. “To deploy such a system in the real world, an AI developer would need to demonstrate an unusually high standard of safety.”

This is where governance structures come into play. OpenAI recently announced it would award ten $100,000 grants to organisations developing AI governance systems and the G7 group of wealthy nations are set to meet to discuss how to tackle the AI risk.

DeepMind said: “If we have better tools for identifying which models are risky, companies and regulators can better ensure” training is done responsibly, deployment decisions are taken based on a risk evaluation, transparency is central, including reporting on risks and that there are appropriate data and information security controls in place.

Harry Borovick, general counsel at legal AI vendor Luminance, told Tech Monitor that compliance requires consistency. “The near constant reinterpretation of regulatory regimes has created a compliance minefield for both AI companies and businesses implementing the technology in recent months,” Borovick says. “With the AI race not set to slow down any time soon, the need for clear, and most importantly consistent, regulatory guidance has never been more urgent.

“However, those in the room would do well to remember that AI technology – and the way it makes decisions – isn’t explainable. That’s why it’s so essential for the right blend of tech and AI experts to have a seat at the table when it comes to developing regulations.”

Read more: Rishi Sunak meets AI developer execs for talks on tech safety