Big data is often cited as fundamental to building an effective artificial intelligence (AI) program.
Yet an alternative, complementary approach has steadily emerged over recent years: one which may cast that conventional wisdom into doubt
Companies are increasingly turning to the vast array of readily available internal small data.
Data – and whether it is perceived as big or small – is generally categorised based on the so-called “three Vs” of variety, velocity, and volume.
Big data tends to encompass a broad range of structured and unstructured data types (think: voicemails, emails and books), at volumes often measured in at least the thousands of gigabytes, and although created at high velocity (take, for example, the number of daily Google searches performed), is typically pulled into a data lake on a scheduled basis.
One of the major challenges that companies face when grappling with big data is getting access to sufficient volumes: unlike the aforementioned tech giants like Google – which can rely on billions of daily searches – many companies simply do not have the ability to tap a resource of this size.
“A lack of data resources… [is] the most significant barrier to deploying AI solutions,” according to an August 2019 survey by the Manufacturers Alliance for Productivity and Innovation.
This fits with a longer-term trend:
- New Vantage Partners found in a January 2020 survey that close to one-third of respondents cited “increasing data complexities and siloed data as a hinderance to AI adoption”.
- And those findings followed a November 2018 survey by McKinsey and Company in which: “Only 18% of respondents have a clear strategy in place for sourcing the data [required for] AI [to] work.”
Secondly, the time, money and skills required to create a framework that will scale as a big data initiative grows is complex: as articulated by then-senior vice-president of research firm NPD Group Stephanie Shen, the design principles for a scalable big data initiative involve:
- designing an architecture (data sources, processing platform, and visualisation front end) appropriate to the overall lifespan of the data-driven initiative
- choosing the correct processing platform is crucial: as this choice creates a modern legacy, an expensive rebuild may result if a small data program outgrows platform capacity
- examining the impact of efficiencies across the pipeline: while negligible at the level of small data, these will quickly escalate if the program becomes big
- grasping that overheads (for example, partitioning, compression, aggregation) fundamental to the effectiveness of a big data initiative are prohibitively expensive when applied in a small data context.
In essence: design a framework that aligns architecture, infrastructure, and data requirements to the expected long-term growth of the initiative.
Thirdly – crucially – though big data can help enterprises identify broad trends in relatively homogeneous and simplistic systems, underlying wrinkles peculiar to individual or small clusters of cases are often smoothed away.
Consequently, a company that overlooks its small data may well be neglecting its own secret sauce.
A “big data approach… is valuable for problems… [with] low to moderate complexity… [as] often such problems can be simplified into if/then adaptation rules,” wrote Eric B Hekler in the July 2019 edition of BMC Medicine. However, a roadblock for big data arises in systems that are manifestly more complex, i.e. those that are either “dynamic, multi-causal”, or present in a variety of locally unique ways.
Small data: simple, actionable, pervasive
So, what is small data exactly and how can it assist enterprises to embed an insights-driven culture into the fabric of the organisation?
Put simply, small data tends to consist of a single or small number of data type(s) in a volume finite enough to be processed in near real-time and allow for individual inspection and comprehension.
Think, for example, of simple and rich small data sources: from simple location data collected as customers browse in an eCommerce experience, to the rich collection of identity, payment, and granular delivery data provided during the checkout process.
These types of small data allow enterprises to get up close and personal with each stage of a digital journey – whether for employees or customers – via individual, granular case studies.
Being able to investigate journey friction based on actual – rather than representative – customer data may enable companies to evaluate the impact of convenience issues relative to showstoppers: respectively, a poorly placed widget against the failure of a payment form to load.
Small data helps companies evaluate the real dollar and brand impact of journey abandonment.
And the power of small data lies is in its simplicity: each singular case can form the basis for predicting a cluster within the wider population.
If an interrogation of local case data reveals that a customer in a certain postcode has purchased a generic brand product rather than a premium promoted alternative then a candidate prediction could be that a customer’s location may have more of a causal link with the brand of product purchased than advertising spend.
Small data mining is rapid: when a candidate predictor is identified at the individual level, it can be quickly proven against a wider cluster of similar individuals and either adopted or disproven and discarded.
These simple, granular-level predictions can also be combined with other candidate predictors like the relationship between the customer’s household income and the combination of products bought to build up a cluster of buyer personae.
Whereas in big data – where insights are drawn from an aggregated dataset; small data can be used to develop a collection of specific and individual case studies, which can be dynamically combined in a variety of different clusters.
This bottom-up approach contrasts with the top-down approach prevalent in big data mining, where aggregate purchasing trends are typically ascribed to a set of ideal buyer personae.
While these ideal personae may help to describe insights at a broad level, marketers often face a challenge in relating these abstract models of customer behaviour to prospective buyers in real life: without access to a concrete example in the data, it can be difficult to articulate the exact motivations of these buyers.
Small data provides the necessary granularity upon which to prioritise workflow improvements, more accurately model target customers, and therefore improve the effectiveness of personalisation initiatives.
Or, to put it another way: small data is simple, actionable, and pervasive.
Small data enable alternative machine learning models
A happy side-effect of investing in small data is the alternative machine learning options it enables.
Compared to deep learning methods – so-called for the hidden depth of interconnected trainable nodes that each tune the weights assigned to input signals to better predict a known outcome in a narrow domain – asserted to perform best when trained on vast amounts of data; small data gives rise to a set of alternative learning techniques.
These techniques appear to further the long-held tradition of modelling machine learning approaches on organic systems.
Specifically: collaborative machine learning techniques that leverage small data tend to outperform singular or single learning system teams of learning agents, analogous to how multi-disciplinary human teams tend to deliver better results than their more homogeneous counterparts.
These techniques can include:
- few-shot learning
- zero-shot learning
- collective learning
- transfer learning.
Few-shot learning (alternatively: low-shot, one-shot learning) is fundamentally a way to train a machine learning algorithm using small data that is specifically relevant to the training problem and – when coupled with meta-learning (applying principles previously used to learn a different task) to generalise – teaches the algorithm to optimise its predictions on this limited, specific data set.
In few-shot learning, the emphasis is on the quality rather than quantity of the training data.
By contrast, zero-shot learning refers to a training model in which – having seen an optimal minimum of labelled data during the training phase – a machine learns how to recognise a class of objects in the domain without previously seeing labelled examples of that class.
Often, this method is cited as learning on the fly: the aim of zero-shot learning is to reduce the training phase requirement for masses of slightly different permutations and rely more on inference.
The inference stage in zero-shot learning is crucial: during this stage, the algorithm attempts to predict and categorise classes of unseen data using an analysis of its labelled data predictions to map which of the underlying attributes have the greatest likelihood of describing the data in general.
Turning to collective learning – a best-of-breed machine learning strategy – many local artificial agents collaborate to deliver faster, better learning outcomes.
This strategy relies on the heterogeneity of the system (according to Gifford 2009, Farina 2019, and others): one or more of the local agents pursue different learning models, with local data and architecture obscured from the wider system, each sharing knowledge with neighbouring agents to improve the predictive capacity of the overall system.
To share knowledge, local agents also tune the weightings given to inputs from neighbouring agents as each proves successful (or fails) in predicting outcomes.
“The goal of collective learning is a single predictive model that is more accurate than the sum of its parts,” wrote Professor Bryan Low in January 2019. By fusing multiple, heterogeneous essentially black-box models, collective learning delivers better predictability, while maintaining the privacy of the underlying data.
Collective learning therefore has three crucial benefits: speed, performance, and privacy.
Finally, transfer learning combines a previously learned model with domain-specific small data to get a fast-start on the new learning domain, which aims to produce models with better inductive performance.
“Transfer learning… is usually concerned with improving the speed with which a model is learned, or with improving its… inductive capability,” according to Torrey and Shavlik in the Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods and Techniques (2010). “Inductive transfer can not only… improve learning…, but [is] also as a way to offset the difficulties posed by… small data sets.”
Transfer learning therefore uses prior knowledge to achieve a fast start: by augmenting the corpus of training data, the resulting model may well achieve higher performance than without.
In sum: firms can rely on transfer, collective, and low-shot learning with small data to improve predictive outcomes, accelerate AI adoption, and achieve more general model applicability with less data sourcing requirements.
Combining small data learning with deployable AI
Ultimately the value of a complete insights-generation strategy – using big and small data – is measured in improvements to decision making, customer retention, and operational effectiveness.
Consequently, a wide range of commercial and open-source AI solutions is emerging:
- The scikit-learn package for Python.
- A Cognitive Toolkit from Microsoft.
- H2O.ai, a distributable, in-memory AI solution.
- The Data Science and AI Cloud Pak from IBM.
- SageMaker Neo from Amazon.
- The TensorFlow framework and a Cloud AI platform from Google.
- OpenAI, an API built to generate meaning from natural language queries.
Many other packages of this nature are available.
So, what is at stake for evaluating the benefits of a small data program?
Enabled by small data, these types of deployable AI solutions may well enable enterprises to amplify the intrinsic benefits of their institutional knowledge.
By putting the power of insight generation into the hands of decision makers, small data enables firms to make the most of their secret sauce.