View all newsletters
Receive our newsletter - data, insights and analysis delivered to you

Introducing Microsoft’s AI-Powered Video Categorisation Tool

Tool orchestrates multiple AI models in a building block fashion

By CBR Staff Writer

Enterprises with a large media archive often struggle with the challenge of transforming existing video archives into business value, particularly given the challenges of content discovery at scale: content categorisation is often flawed and manual tagging is expensive, error-prone and scales badly.

Microsoft thinks its upgraded product is the solution – and it’s a poster child for AI.

“Multi-modal topic inferencing” in Microsoft’s Video Indexer tool takes a tripartite approach to automating media categorisation: transcription (spoken words), OCR content (visual text), and facial recognition; operating under an innovative supervised deep learning-based model. It can even recognise moods.

What is Video Indexer?

Video Indexer is a cloud application built on a raft of Microsoft Azure tools, including Media Analytics, Search, Cognitive Services – such as the Face API, Microsoft Translator, the Computer Vision API, and Custom Speech Service – and more.

It’s designed to help business users extract insight from videos using and provide services ranging from keyframe extraction to sentiment analysis; visual content moderation (like detecting “racy” visuals) and brand identity recognition.

video indexerVideo Indexer: Shift from Keyword Extraction 

Oron Nir, a senior data scientist in Microsoft’s Media AI division said: “[This tool] orchestrates multiple AI models in a building block fashion to infer higher level concepts using robust and independent input signals from different sources.”

The technique is a step-change from Video Indexer’s previous keyword extraction model, which pulls out and categories only according to explicitly mentioned terms.

Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

Multi-modal topic inferencing, by uses a “knowledge graph” to cluster similar detected concepts together. In practice, it does this by applying two models to extract topics.

As Nir explained in a recent blog: “The first is a deep neural network that scores and ranks the topics directly from the raw text based on a large proprietary dataset. This model maps the transcript in the video with the Video Indexer Ontology and IPTC.”

“The second model applies spectral graph algorithms on the named entities mentioned in the video. The algorithm takes input signals like the Wikipedia IDs of celebrities recognized in the video, which is structured data with signals like OCR and transcript that are unstructured by nature.”

He added: “To extract the entities mentioned in the text, we use Entity Linking Intelligent Service aka ELIS. ELIS recognizes named entities in free-form text so that from this point on we can use structured data to get the topics. We later build a graph based on the similarity of the entities’ Wikipedia pages and cluster it to capture different concepts within the video.”

For facial recognition, it can now automatically identify over 1 million celebrities – such as world leaders, actors and actresses, athletes, researchers, business and tech leaders across the globe. Now if those legacy media archives could just be got off those piles of magnetic tape in a dusty cellar somewhere…

See also: Did Amazon Just Kill Tape Storage?


Topics in this article :
Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.