It’s expected that by 2020, there will be six connected devices for each person on the planet. In line with this growth, the Internet of Things (IoT) has been moving away from an over-used buzzword to a very real possibility. Alongside this, the opportunities to use the data produced for business and consumer benefit are only set to increase as organisations become more adept at gleaning insights from IoT.
The use cases already exist — from the manufacturing industry where data is collected on faulty machinery and used to calculate potential future defects, to agricultural projects using connected farm equipment to establish a field’s microclimate. In fact, Beecham Research recently claimed that IoT could increase food production by 70%, helping to feed the 9.6 billion global population expected by 2050.
One issue often overlooked is how can IoT actually be implemented in data processing terms, and how quickly can business gain insight from the data being produced? The answer lies in the selection of the appropriate platform to manage the entire data process.
Difficulties with the Data
The volume of data already generated by connected devices is vast and needs a "big data" approach. Soon 50 billion connected devices, the amount predicted to exist by 2020, will be delivering a constant stream of information.
To make sense of all this data, it’s vital that businesses properly store all of it to build up historical references and establish a depth of data from which to detect and understand trends. To do so and handle huge numbers of small individual files, a highly scalable IoT processing platform is essential. Can the platform scale to millions, billions, or even to trillions of files? The answer has to be "yes" every time.
When it comes to speed, it’s no longer enough to simply store information in a data warehouse and report back on it days later. With IoT use cases, this simply isn’t effective. Instead, data needs to be processed on a streaming basis with the ability to identify and act on interesting information quickly and effectively.
Vast quantities of both unstructured and structured data are produced as a result of IoT, so a flexible platform is needed to properly store and process all types of data, regardless of its form. It should also support stream processing from the outset and have the capability to deal with low-latency queries against semi-structured data items, at scale.
Architecture of Things
One of the challenges of the current IoT landscape is that there are no widely accepted reference architectures. However, those that have been proposed have one common theme — polyglot processing. By combining various processing modes within a platform, it is possible to deal with many different formats.
Privacy and security are additional functionalities that need to be covered by an IoT data processing platform. From data masking to provenance support over encryption, protecting the privacy of users is key.
Simply collecting data is only part of the solution. What’s key is the ability to combine deep predictive analytics from historical data with real-time events. Therefore, an integrated database is essential.
The natural choice of platform for such a data challenge is Apache Hadoop, which is designed for large-scale, data-intensive deployments. Hadoop is designed to process huge amounts of data by connecting many commodity computers together to work in parallel. Using the MapReduce or the Apache Spark execution engine, Hadoop can take a query over a dataset, divide it, and run it in parallel over multiple nodes.
This capacity makes it ideal as a data processing platform for the volume of data being produced by different components in the IoT landscape. As the volumes of data created are only going to go up, the key is to put an effective data processing platform in place at the start of operation, in order to ensure benefits are delivered not only quickly, but indefinitely.
Preparing for IoT
There are already areas where IoT is starting to have a real and demonstrable impact, with plans in place that will dramatically increase the volume of data being produced from IoT initiatives such as smart cities. For example, Bristol is set to be the UK’s first "programmable city" this year. From March 2015, the city’s data on air quality, temperature, humidity, traffic movement and traffic signal patterns will be collected and analysed helping decisions to be made by the council on areas like spending and legislation.
It’s clear the opportunity for IoT is huge and will continue to gain momentum over the next decade. What’s key to this development is that organisations select the right data processing platform and support from the very start in order to meet the requirements the accompanying IoT data will bring. If this is done effectively, then the possibilities for IoT innovation are endless.