I had the privilege to deliver the keynote on Data Science & Audience Engagement at the Data Science for Media Summit, hosted by the Alan Turing Institute, in Edinburgh last week. It was the first media focused event by the institute and highlights the increasing role that data science is playing in our industry.
It is not surprising, however, as the proliferation of content choice and sources mean that media organizations need to work harder to better understand and serve their audiences while consumers need help to discover and enjoy great content. What role does data science play today and where will it go in the future? To answer these questions it is useful to explore what data actually exists to work with. We can loosely group the data in two categories – human generated data and machine generated data. Both provide important data sets that can be leveraged by data science tools and techniques.
Human Generated Data
There is a rich and diverse set of data that is created by media professionals and consumers to describe, rate and debate content. This includes material such as subtitles and editorial metadata that has been used to make programming more accessible and discoverable on TV for many years. Editorial metadata provides high quality descriptions of shows, series and contributors that populate EPGs, TV guides and increasingly apps. We, at Ericsson, create over 200,000 hours of subtitles each year alone and host millions of records of editorial metadata.
Alongside this data created by media professionals sits an increasingly large volume of data created by audiences themselves. Twitter has become the conversational medium of choice for many TV viewers, creating very large volumes of data in the process, not to mention Facebook with its ubiquitous ‘likes’ allowing TV viewers to express and share their favourite shows. User ratings on dedicated review sites and content owner platforms are crowdsourcing how much we love, or don’t love, TV shows and movies at a scale never before seen.