View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. AI and automation
September 19, 2019

These AI Agents Punched Holes in Their Virtual Universe While Playing Hide and Seek

Bots removed opponents' tools from the game space, and launched themselves into the air...

By CBR Staff Writer

Two teams of AI agents tasked with playing a game (or million) of hide and seek in a virtual environment developed complex strategies and counterstrategies – and exploited holes in their environment that even its creators didn’t even know that it had.

The game was part of an experiment by OpenAI designed to test the AI skills that emerge from multi-agent competition and standard reinforcement learning algorithms at scale. OpenAI described the outcome in a striking paper published this week.

The organisation, now heavily backed by Microsoft, described the outcome as further proof that “skills, far more complex than the seed game dynamics and environment, can emerge” (from such experiments/training exercises).

Some of its findings are neatly captured in the video below.

In a blog post, Emergent Tool Use from Multi-agent Interaction, OpenAI noted: “These results inspire confidence that in a more open-ended and diverse environment, multi-agent dynamics could lead to extremely complex and human-relevant behavior.

The AI hide and seek experiment, which pitted a team of finders against a team of seekers, made use of two core techniques in AI: multi-agent learning, which uses multiple algorithms in competition or coordination, and reinforcement learning; a form of programming that uses reward and punishment techniques to train algorithms.

Content from our partners
Rethinking cloud: challenging assumptions, learning lessons
DTX Manchester welcomes leading tech talent from across the region and beyond
The hidden complexities of deploying AI in your business

In the game of AI hide and seek, the two opposing teams of AI agents created a range of complex hiding and seeking strategies – compellingly illustrated in a series of videos by OpenAI – that involved collaboration, tool use, and some creative pushing at the bounderies of the virtual parameters the world creators thought they’d set. 

“Another method to learn skills in an unsupervised manner is intrinsic motivation, which incentivizes agents to explore with various metrics such as model error or state counts,” OpenAI’s researchers

“We ran count-based exploration in our environment, in which agents keep an explicit count of states they’ve visited and are incentivized to go to infrequently visited states”, they added, detailing the outcomes which included the bots removing some of the tools their opponents were given entirely from the game space, and launching themselves into the air for a birds-eye view of their hiding opponent.

As they concluded: “Building environments is not easy and it is quite often the case that agents find a way to exploit the environment you build… in an unintended way”.


Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.