Researchers at the Massachusetts Institute of Technology (MIT) have released a data visualisation tool, which they claim allows users to highlight patterns and determine which data sources are responsible for which.
The new tool, dubbed DBWipes, could be used to identify abberations, for example, faulty sensors that are corrupting a regular pattern of readings or variables that are affecting a company’s sales figures, according to MIT’s Computer Science and Artificial Intelligence Laboratory.
Samuel Madden, a professor of computer science and engineering and one of the Database Group’s leaders, said: "If you look at the way people traditionally produce visualisations of any sort, they would have some big, rich data set — that has maybe hundreds of millions of data points, or records – and they would do some reduction of the set to a few hundred or thousands of records at most.
"The problem with doing that sort of reduction is that you lose information about where those output data points came from relative to the input data set. If one of these data points is crazy – is an outlier, for example – you don’t have any real ability to go back to the data set and ask, ‘Where did this come from and what were its properties?’"
Developed in collaboration with engineering graduate Eugene Wu and Professor Michael Stonebraker, DBWipes includes a "provenance tracking" system for large data sets in efforts to solve this problem.
The researchers claim the provenance-tracking system provides a compact representation of the source of the summarised data, allowing users to trace visualised data back to the source.
The researchers also worked on an algorithm called Scorpion, which is designed to track down the records responsible for aspects of a DBWipes visualisation and efficiently recalculates the visualisation to either exclude or emphasise the data they contain.
The idea was driven by a study at a Boston hospital, which revealed that a number of patients were incurring much higher treatment costs than the rest.
The researchers found that the difference in costs was explained by a single variable: their doctors, who prescribed more treatment than their colleagues.