Big data has undeniably become one of the most talked about technological trends over the past few years. From vendors to analysts and businesses to developers, the buzzword over simplifies what is a complex issue in a complex market.
With any ‘new’ industry there comes a host of companies that will position themselves as a standard, the go-to company for all things big data. In the past we have seen Oracle become the go-to vendor for databases and Microsoft become the number one choice for PC software.
CBR identifies five platform vendors that are battling to become the standard for big data.
Pivotal is an American software company that is a spin-out and joint venture of EMC and its subsidiary VMware.
Although Pivotal Labs was founded in the 1990’s, the Pivotal part of the company was founded just three years ago in 2013. Since then the company has positioned itself as one of the leaders in the big data market.
This has been down to its Big Data Suite which is compatible with distributions of ODPi versions of Hadoop in addition to having all components being formed of open source projects.
ODPi is an initiative that seeks to try and make sure that applications would work across multiple Apache Hadoop distributions.
The open source approach is one that more vendors are looking to exploit due to a growing demand from customers for open and flexible architecture and development.
Connections to cloud offerings as part of the Cloud Foundry give the Big Data Suite an extra boost and it offers tools for advanced analytics and scalable applications. The idea behind the suite is to close the gap between capturing and storing data and operationalising that data.
For data processing it offers Spring XD, Spark and Pivotal HD while on the advanced analytics front there is the Pivotal Greenplum Database and Pivotal HAWQ, for operating apps at scale the company offers Pivotal GemFire, Redis, RabbitMQ and Pivotal Cloud Foundry.
One of the major companies in the Hadoop ecosystem, Cloudera describes its big data offering as a modern platform for data management and analytics and, like Pivotal, it aims to help businesses get value from their data.
The company’s pitch to businesses is that its Enterprise platform can offer cyber security insights by helping to identify and mitigate risks through the use of advanced analytics; that they can gain a 360-degree view of customers; and that it is well suited to the Internet of Things as it delivers real-time analytics around IoT data.
The Cloudera Enterprise offering includes components that are designed to build a Data Hub, a database which is populated with data from one or more sources, as well as management capabilities via the Cloudera Manager.
Given that the platform offers Hadoop, HDFS, YARN, MapReduce, Pig, Hive, Hbase, Oozle, Impala, Spark, Sqoop and numerous other capabilities, the Cloudera Manager is an important element that shouldn’t be overlooked.
The ability to manage all of those capabilities easily is one of those differentiating factors that have made it a very popular Hadoop platform.
Another of the leading companies in the Hadoop ecosystem, Hortonworks has aligned itself around the ODPi.
The Hortonworks Data Platform has seen many advancements in the past year as the company taps into the open source community to push forward its technology. At its core the platform offers YARN (Yet Another Resource Negotiator), a cluster management technology, which is built for multi-workload data processing across numerous processing methods that includes batch, interactive and real-time.
Data access includes in-memory, stream, NoSQL, SQL, script and batch and like Cloudera it offers a number of data governance and integration tools such as Falcon, Oozie, Sqoop, Flume and Kafka.
One of the benefits of the HDP is that it integrates with the business’s existing data analytics tools, or ones that are to be added in at a later date. Applications in the Hortonworks ecosystem include the likes of SAS, SAP, Splunk, HPE, Datameer and Tableau, while data systems can include Oracle, SQL Server, Teradata and SAP HANA.
Like the other offerings it can also be deployed on-premise, in the cloud and on Windows in addition to automated capabilities for backup to Microsoft Azure and Amazon S3.
Looker is a company that may not be as familiar a name but it is basically offering a Business Intelligence data platform.
The company says that it offers a unique data modelling layer that lets the user describe and transform raw data so that it can be understood and then used as a trusted source for the business.
This process is done automatically by Looker when it is first connected to the business database, users can then build customised metrics with LookML which is a modelling language.
One of the benefits of Looker is that it operates where the data sits, this means that it does not need to extract subsets of pre-aggregate, meaning that time to deployment it significantly reduced.
Like the others named in this list, Looker offers analytics capabilities in addition to those already built in. SQL query engines such as Spark and Presto are included in order to tap into Hadoop’s processing power; other tools include Presto, Hive and Cloudera Impala.
Another feature of the platform is that it is designed to be extendable, meaning that developers can embed new elements.
Areas that can be enhanced include embedded analytics, RUBY SDK for query results and charts and third-party authentication via LDAP, SAML, Google Apps, and multi-factor authentication.
The company offers a monthly release cycle so users don’t have to wait long for improvements and additional features.
Teradata is a traditional database and data management company that has put its data expertise to good use by applying it to big data.
Founded in 1979, the company has a much longer history in the data world than its counterparts in this list but that doesn’t mean it hasn’t been acting quickly, or has an old world solution to new world problems.
The company provides end-to-end solutions and services in data warehouse, big data and analytics, and marketing applications all with the goal of creating data-driven businesses.
The Teradata Integrated Big Data Platform is designed to expand the data space and workload capacity of the company’s integrated data warehouse. This allows users to ingest, prepare, and analyse data from a single interface.
In addition to allowing the user to create data visualisations the company offers QueryGrid, a tool for optimising data analytics across the enterprise.
QueryGrid works on the users chosen analytic engine and file system and drives analytic processing via a single SQL query.
Teradata’s flagship product at the moment is the Aster Discovery Platform, it is a framework offered by the company with its Aster database and Discovery Portfolio with built-in analytics functions, a graph processing engine, MapReduce and a version of R.
A key asset of Teradata is its history in the data market, this has helped it get some big clients on board including nine of the top 10 telecoms companies including O2 and Siemens, in addition to numerous financial services organisations such as PayPal.
This article is from the CBROnline archive: some formatting and images may not be present.