View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Cloud
April 19, 2016updated 05 Sep 2016 11:08am

Spark, Kafka & machine learning: 10 big data start-ups taking analytics to the next level

List: What start-ups are worth watching as they grow in the big data market?

By James Nunns

The rise of both structured and unstructured data has created a booming market that is expected to be worth around $41.5 billion by 2018.

The rapid growth of the big data market has resulted in the creation of a large crop of vendors that are all looking to take a slice.

Amid the plethora of vendors competing for market position are a number of start-ups that are aiming to help organisations collect and analyse data. CBR identifies 10 companies that are worth watching.


1. Confluent

Founded in 2014, the company has over $30 million in capital raised so far from investors such as LinkedIn, Index Ventures, Benchmark Capital and The Data Collective.

The company was founded by the developers behind Apache Kafka, a real-time messaging and streaming big data engine. After its creation inside LinkedIn it was then contributed to the Apache Software Foundation and spun out as a separate company.

Confluent is basically a commercial provider and supporter of the Apache Kafka software. It worked at LinkedIn by going through the process of fully instrumenting everything that happens in a company and making it available as a real-time Kafka feed that is fed to data systems like Hadoop, Search, Newsfeed and so on.

Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

Confluent Platform

The company is focused on building a stream data platform to help companies get access to enterprise data as real-time streams.

Confluent provides toolsin languages that include Java, C, and C++ and allows messages to be produced or consumed with any network connected tool using its REST proxy.



The company, which used to go by the name 0xdata, was founded in 2011 and has raised $33.6m in capital from investors such as Nexus Venture Partners, Paxion Capital Partners, and Transamerica Ventures.

Founded by SriSatish Ambati, co-founder of Platfora and Cliff Click, lead developer of the Java Virtual Machine, H20 started with the idea of making it easier for developers and data scientists to use machine learning algorithms in their applications.

The company offers an open source machine learning platform that is designed to work with Hadoop and Spark while being used through a Web UI or in different programming environments such as R, Java, Python, Scala and JSON.

The platform supports database and file types such as Microsoft Excel, R Studio, and Tableau.

H2O can help develop models to build machine learning capabilities so that data can be parsed, ingested and modelled. At its most basic, the technology helps to quickly create and deploy machine learning algorithms.


3. AtScale

AtScale was founded in 2013 and has so far raised $9m in capital from investors such as AME Cloud Ventures, Storm Ventures, and UMC Capital.

The company was created with the idea of solving the problem of using familiar business intelligence (BI) tools and interfaces, such as SQL and Tableau, with technologies such as Hadoop – it basically bridges the gap between the business users, their visualisation tool and their underlying Hadoop platform.

The goal is for companies to be able to perform analysis with the data in place, removing the need to move it to a specialised analysis tool, which can be time consuming and costly.

AtScale was created by Hadoop and BI veterans who have developed the ability to turn Hadoop clusters into scale-out OLAP servers.

It supports BI tools that can talk to SQL or MDX.


4. Interana

Labelling itself as delivering behavioural analytics for event data, the company helps firms to make data driven decisions.

Co-founded in 2013 by CEO Ann Johnson and Bobby Johnson, CTO, the company has raised $28.2m, including $20m in a Series B round that was led by Index Ventures.

Interana’s focus is on providing interactive analytics that help businesses to answer questions about how their customers behave and how products are being used.

The company uses a proprietary database that allows it to deal with billions of events with speed.

Companies such as Tinder are using it for troubleshooting networking connectivity, measuring the effectiveness of social media and monitoring the way that users are swiping. It is being broadly used throughout the company to improve its service and operations.


5. Tamr

Tamr combines machine learning software and data science in order to broaden visibility into procurement.

The company was founded in 2013 by database veterans Andy Palmer, Mike Stonebraker and Ihab Ilyas with Goerge Beskales, Daniel Bruckner and Alex Pagan.

Tamr uses a scalable data-unification platform, or machine learning and human input, to help customers use data that is siloed in disparate databases, spreadsheets, logs, and partner resources.

Total equity funding raised by the company is $42.4m with its most recent Series B round raising $25.2m. Lead investors include Hewlett-Packard Ventures, Thomson Reuters and MassMutual Ventures.

At its most simplistic, Tamr is a data cleaning start-up that aims to clean data from multiple different sources so that it is easier to use.


6. Wavefront

Based in Palo Alto, California, Wavefront was founded in 2013 and has raised $20.5m with investors such as Sequoia Capital, Sutter Hill Ventures, and Webb Investment Network.

The company provides a real-time analytics platform that is built to pull data from all the systems in an IT organisations stack in order to help predict and prevent downtime by identifying and diagnosing issues.

Wavefront uses a query language that allows time series data to be manipulated, it also allows for users to craft queries by using dropdown menus, filters, and auto-complete forms. Its technology was initially developed internally at Google and Twitter and is currently in use by companies such as Box, First Data and Workday.


7. BlueTalon

Another company that was founded in 2013 and is based in California, but this time in Redwood City, BlueTalon has raised $11.4 million from investors such as Data Collective, Signia, Venture Partners, and Bloomberg Data.

The company is offering data-centric security for big data, including Hadoop and SQL through the use of access control and dynamic masking capabilities on the Hadoop Distributed File System.

In addition to working on all distributions of Hadoop, it also works on Microsoft Azure and Amazon Web Services.

BlueTalon auditing

According to BlueTalon, it lets the user define sets of policies that support data provisioning which means that business users and developers are given access to only the data that they need.

The company also offers auditing capabilities to give users visibility into who has accessed what data, when, and for what reason.


8. Cazena

Cazena was only founded in 2014 but has already raised $28m from investors such as North Bridge Venture Partners and Growth Equity, along with Andreessen Horowitz and Formation 8.

After emerging from stealth in July 2015 it appeared with a Big Data-as-a-Service offering that promises to move big data processing into the cloud with only three clicks, the company says.

The focus of the company is on the processing of big data in an encrypted cloud via Big Data-as-a-Service. Its offerings are broken down into Data Lake, Data Mart and Sandbox editions.

For workload intelligence the company provisions and optimises cloud infrastructure and data technologies such as Hadoop, Spark, MPP SQL and Search. Cazena’s Data Mover works by gather, encrypting, and moving data into the cloud while its Gateway technology connects enterprise analytics tools to cloud data sources.

Cazena offers end-to-end encryption of data, at rest and in-motion with keys that are managed by the enterprise.


9. DataTorrent

DataTorrent is one of the older start-ups on the list having been founded in 2012. So far it has raised $23.8m including a $15m Series B round led by Singtel Innov8.

The company focuses on real-time big data analytics technology that is supported by an open source based stream and processing engine which the company says can deal with billions of events per second in Hadoop clusters.

DataTorrent dashboard

DataTorrent supports data ingestion from sources such as Kafka, AWS S3n, HDFS, NFS, JMS and more.

The DataTorrent RTS Core is an open source enterprise-grade unified stream and batch processing engine that provides a set of system services that can help developers to focus on business logic. The company also offers a management console that is a full Hadoop-integrated application that provides a graphic interface for "lights out management."


10. Databricks

Founded in the UC Berkeley AMPLab by the creators of Apache Spark, the technology that has become extremely popular across the data analytics landscape, Databricks is a commercial and support provider of Spark.

The company has aimed to tap into the momentum of Spark within both its community and with numerous vendors such as IBM. Databricks itself provides tools for interactively analysing, visualising, and curating data, as well as collaboration and integration tools.

After its founding in 2013, it has set about helping clients with cloud-based big data processing using Spark. It has raised $47m in two rounds including a $33m Series B in June 2014.

The company is closely tied to the development of Spark and it recently pointed out three feature changes. One will be implementing the next phase of Project Tungsten in order to speed up Spark by working around Java’s memory-handling limitation, improvements to the real-time streaming systems, and unifying the structured data APIs that it uses into a single API.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.