View all newsletters
Receive our newsletter - data, insights and analysis delivered to you

MIT Researchers Teach AI System to Predict Acts From Key Video Frames

“That’s important for robotics applications, you want [a robot] to anticipate and forecast what will happen early on"

By CBR Staff Writer

MIT researchers have developed an add-on module for Artificial Intelligence (AI) systems that can, by analysing a few frames of a video feed, predict how objects will be changed or transformed by human action.

The module is called Temporal Relation Network (TRN) and it gives AI systems the ability to learn how objects can undergo changes at different times in a video.

The researchers at the Massachusetts Institute of Technology aim to build better AI systems that have higher activity recognition and a higher comprehension of what is happening to the world around them.

Artificial Intelligence Laboratory

Former PhD student in the Computer Science and Artificial Intelligence Laboratory at MIT Bolei Zhou, commented in blog post that: “We built an artificial intelligence system to recognise the transformation of objects, rather than appearance of objects.”

“The system doesn’t go through all the frames — it picks up key frames and, using the temporal relation of frames, recognise what’s going on. That improves the efficiency of the system and makes it run in real-time accurately.”

“That’s important for robotics applications, you want [a robot] to anticipate and forecast what will happen early on, when you do a specific action,” Zhou – currently an assistant professor of computer science at the Chinese University of Hong Kong – added.

Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

MIT Researchers

The researchers tested and trained the module on three crowd-sourced datasets of videos which contained footage of various activities being performed.

The first one was made by the company TwentyBN and features 200,000 videos of 174 action categories, examples would be a hand poking and knocking over a stack of cans.

The second, dubbed Jester, contains 150,000 videos showing 27 different hand gestures. While the last dataset called Charades teaches the module what different activities look like, such as playing basketball or carrying a bicycle.

MIT researchers

According to MIT, when the TRN is fed a video it: “Simultaneously processes ordered frames in groups of two, three, and four — spaced some time apart.” It then judges whether or not objects transformation in those key frames is the result of a specific activity.

“If it processes two frames, where the later frame shows an object at the bottom of the screen and the earlier shows the object at the top, it will assign a high probability to the activity class, “moving object down,” MIT researchers note.

Further Research

The next steps for the researchers at MIT will be to integrate object recognition with the activity recognition software. Luckily work is already well on its way in training AI to identify objects in video frames.

See Also: Machine Learning vs “Eye-Balling”: MIT Research Cuts Chemo Doses by 75%

A harder task will be to train the machine to learn ‘intuitive physics’, which would give the AI a better understanding of the real-world properties that objects possess.

“Because we know a lot of the physics inside these videos, we can train module to learn such physics laws and use those in recognizing new videos. We also open source all the code and models. Activity understanding is an exciting area of artificial intelligence right now,” commented Bolei Zhou

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.