IBM is bringing the world’s oldest tennis tournament into the 21st Century, leveraging AI technology to provide ‘Cognitive Highlights’ to the Wimbledon Championships.
Long-time Wimbledon partner IBM first explored the concept of ‘Cognitive Highlights’ at the 2017 Masters Golf Tournament, with the proof-of-concept combining computer vision and other AI technologies to listen, watch and learn from a live video feed of the tournament. The PoC then automatically identified and curated the most exciting moments which were then put into segments that could be used in online highlight packages.
The solution for SW19 however will go beyond selecting and curating individual segments for a video editor to choose from. Instead, using the power of AI, one to two minute highlights of matches will be automatically created for the Wimbledon editorial team’s use across the Wimbledon Digital Platforms.
Leveraging the power of AI and cognitive technologies will be even more key at Wimbledon than at the Masters – with the golf tournament lasting a mere four days to Wimbledon’s 13. Video from the matches can quickly add up to hundreds of hours of footage, with this sheer volume of data requiring a mix of cognitive technologies and advanced engineering techniques that can integrate multiple data points, audio and visual components to search, discover and extract key scenes or moments.
Explaining exactly how the new production of highlight reels will work, Rogerio Feris at IBM research said:
“For this year’s Championships, the production of highlight reels will rely on a number of steps and technologies. Video of the matches will be collected by IBM soon after their completion.
“Initial highlight candidates are identified using information from the on-court statistician and other sensors that provide data, for example, on the speed of the ball, number of aces, saved breakpoints, etc.
“The system then continues to capture segments of a match that could be predictors of an exciting moment using a combination of audio and video AI tools that analyze crowd cheering as well as action recognition (i.e. visuals of player behavior) as well as scoring data. Based on these different modalities, the video segments are rank-ordered and selected to produce the final highlight video for each match.
An interesting detail of the new system is that the IBM Research team had to teach the system to recognise crowd cheering and the player’s reaction using videos of matches of previous tournaments, with associated contextual metadata from the on-court statistician to filter out specific content. This necessitated state-of-the-art deep learning models which provide effective methods for learning new classifiers using a few manually annotated training examples, using active learning techniques.
The applications of this system, according to IBM, extend past sporting events. The technology could simplify production for media and entertainment companies that boast huge archives of video and footage which is not easily searched. The technology could also be extended to provide summarisation tools for consumer videos captured by mobile phones or wearable cameras.