According to the Digital Universe study by IDC, global data volumes will grow from 4.4 zettabytes in 2013 to 44 zettabytes by 2020. That’s a staggering increase. Managing all of this new data represents several opportunities for businesses, but also significant challenges. After all, it’s not just the volume of data as much as the increasing variety of data sources and formats that presents a problem. With mobile apps, machine data, on-premises applications and SaaS all flourishing, we are witnessing the rise of an increasingly complicated information value-chain ecosystem. IT leaders need to incorporate a portfolio-based approach and combine cloud and on-premises deployment models to sustain competitive advantage. Improving the scale and flexibility of data integration across both environments to deliver a hybrid offering is vital to providing the right data to the right people at the right time.
The evolution of hybrid integration approaches creates requirements and opportunities for converging application and data integration. The definition of hybrid integration will continue to evolve, but the ‘direction of travel’ is clearly to the cloud.
Gartner is projecting dynamic growth in public cloud spending. According to the research firm, the worldwide public cloud services market is projected to grow 16.5 percent in 2016 to total $204 billion, up from $175 billion in 2015. The highest growth will come from cloud system infrastructure services (infrastructure as a service [IaaS]), which is projected to grow 38.4 percent in 2016.
The increasing focus on the cloud means that customers will need to have an effective hybrid integration strategy. At Talend, we have identified five phases of cloud data integration, starting with the oldest and most mature and going right through to the most bleeding edge and disruptive. Here, we provide a brief overview of each phase of that integration and highlight how businesses can optimize the approach as they move from one step to the next.
Phase 1: Replicating SaaS Apps to On-Premise Databases
The first stage in developing a hybrid integration platform is to replicate SaaS applications to on-premises databases. Companies in this developmental phase typically either need analytics on some of the business-critical information contained in their SaaS apps, or they are sending SaaS data to a staging database so that it can be picked up by other on-premises apps.
So as to increase the scalability of existing infrastructure, it’s best to move to a cloud-based data warehouse service within AWS, Azure, or Google Cloud. The scalability of these cloud-based services means businesses don’t need to spend cycles refining and tuning the databases. Additionally, they get all the benefits of utility-based pricing. However, with the broad range of SaaS apps today generating even more data, they may also need to adopt a cloud analytics solution as part of their hybrid integration strategy.
Phase 2: Integrating SaaS Apps with on-premises apps
Each line of business has their preferred SaaS app of choice: Sales departments have Salesforce, marketing has Marketo, HR Workday, and Finance NetSuite. However, these SaaS apps still need to connect to a back-office ERP on-premises system.
Due to the complexity of back-office systems, there isn’t yet a widespread SaaS solution that can serve as a replacement for ERP systems such as SAP R/3 and Oracle EBS. Businesses should not try to integrate with every single object and table in these back-office systems – but rather look to accomplish a few use cases really well so that their business can continue running, while benefiting from the agility of cloud.
Phase 3: Hybrid Data Warehousing in the Cloud
Databases or data warehouses on a cloud platform are geared toward supporting data warehouse workloads; low-cost, rapid proof-of-value and ongoing data warehouse solutions. As the volume and variety of data grows, enterprises need to have a strategy to move their data from on-premises warehouses to newer, Big Data-friendly cloud resources.
While they assess which Big Data protocols best serve their needs, they can start by trying to create a Data Lake in the cloud with a cloud-based service such as Amazon Web Services (AWS) S3 or Microsoft Azure Blobs. These lakes can relieve cost pressures imposed by on-premises relational databases and act as "demo areas", giving businesses the opportunity to process information using their Big Data protocol of choice and then transfer it into a cloud-based data warehouse. Once enterprise data is held there, the business can enable self-service with Data Preparation tools, capable of organising and cleansing the data prior to analysis in the cloud.
Phase 4: Real-time Analytics with Streaming Data
Businesses today need insight at their fingertips in real-time. In order to benefit commercially from real-time analytics, they need an infrastructure to enable them with this level of rapid data insight. These infrastructure needs may change depending on the use case — whether it be to support weblogs, clickstream data, sensor data or database logs.
It’s best for IT leaders to first assess all their data sources in order to judge which ones must remain on-premises versus those that need to be moved to the cloud. For example, most IoT use cases involving sensors with industrial equipment are on-premises, so it’s best to keep your streaming analytics infrastructure on-premises. However, for use cases where you’re collecting streaming data about systems already in the cloud, it’s probably best to keep your infrastructure there also and use existing services within those ecosystems to set up your streaming infrastructure. That way you keep ahead of the game in terms of moving everything to the cloud.
Phase 5: Machine Learning delivers Optimised App Experiences
We live in a ‘mobile first’ society, meaning that every experience will be delivered as an app through mobile devices. In providing the ability to discover patterns buried within data, machine learning has the potential to make applications more powerful and responsive. Well-tuned algorithms allow value to be extracted from disparate data sources without the limits of human thinking and analysis. Businesses will need to harness the expertise of skilled developers who understand that machine learning offers the promise of applying business critical analytics to any application in order to accomplish everything from enhancing customer experience to serving up hyper-personalised content.
Getting Results with iPaaS
In order for companies to reach this level of ‘application nirvana’, they will need to have first achieved or implemented each of the four previous phases of hybrid application integration.
That’s where we see a key role for integration platform-as-a-service (iPaaS), which is defined by Gartner as ‘a suite of cloud services enabling development, execution and governance of integration flows connecting any combination of on premises and cloud-based processes, services, applications and data within individual or across multiple organisations.’
The right iPaaS solution can help businesses achieve the necessary integration, and even bring in native Spark processing capabilities to drive real-time analytics, allowing them to move through the phases outlined above and ultimately successfully complete stage five.