Google yesterday open sourced an improved algorithm for tSNE (a machine learning algorithm for data visualisation), developed by an intern, for its acclaimed machine learning framework Tensorflow, enabling interactive visual experiences when working with large datasets.  

The tech giant posted on its Google Plus page for ‘Google AI’: “Some new research from an intern in our Zürich office shows an approach to tSNE that allows real-time interactive visualization of large, high-dimensional datasets by leveraging GPU capabilities through WebGL. Oh, and it’s open source too! Check it out”.  

Animal Images
tSNE visualization of animals. Credit: Gene Kogan

What is tSNE? 

Simply put, tSNE (t-distributed Stochastic Neighbour Embedding) is a prize-winning algorithm developed by researchers Laurens van der Maaten and Professor Geoffrey Hinton in 2008. It is used in the exploratory data analysis of high dimensional data (a dataset with a high number of attributes). 

What is the Improvement? 

Intern Nicola Pezzotti highlighted that the computational complexity of tSNE made it challenging to visualise large and evolving data sets.  

“The computational complexity of the tSNE algorithm limits its application to relatively small datasets. While several evolutions of tSNE have been developed to address this issue (mainly focusing on the scalability of the similarity computations between data points), they have so far not been enough to provide a truly interactive experience when visualizing the evolution of the tSNE embedding for large datasets.” 

Furthermore, the improvement detailed in his paper, “Linear tSNE Optimization for the Web” heavily relies on modern graphics hardware and has linear computational complexity.

This is a good thing, as linear computational complexity is widely regarded by the computer science community as an efficient benchmark for an algorithm.

Ultimately, this approach has efficiently enabled data visualisation to the extent that it can be used in the browser (traditionally known as having less computational capabilities to run complex algorithms), which could broaden its commercial usage, in areas such as real-time data visualisation applications for biometric or geographical data. 

Already Shared with the Open Source Community for use in Tensorflow 

As mentioned, the library has been open sourced for use in Google’s Machine Learning library Tensorflow.

This will allow developers around the world to utilise its capabilities for a host of different use cases and displays Google’s continual commitment to providing cutting-edge research to the developer community for free. Inspiring work from an intern and a reminder of the power of open source.

 See also: Industry First as Apache Kafka Gets Data Visualisation Software