View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data
February 25, 2016updated 31 Aug 2016 12:43pm

Apache Spark set for software update as Streaming Analytics is given an overhaul

News: Creators of Spark have promised not to change the majority of APIs.

By James Nunns

Apache Spark is set to be revised as version 2.0 of the software is revealed.

Speaking at the Spark Summit East in New York, Matei Zaharia, creator of Spark said that a new version would be coming in April or May this year, assuring the audience that they would not be changing the majority of APIs.

Three specific new features were mentioned in the upcoming version including Tungsten Phase 2 which will bring Spark closer to Bare Metal, Structured Streaming will be a real-time engine on SQL/DataFrames, and Unifying Datasets and DataFrames.

Key improvements will come to data streaming with Spark Streaming, an area of the open source technology that has seen increased popularity due to the growing amounts of Web and mobile data that is being analysed by organisations.

Data Streaming is already present in the Hadoop world with Apache Storm but Spark is the technology that has received the most attention with companies such as IBM developing around it.

The reason for its popularity is because it allows analysts and developers to work with data that is up to date, which results in the outcome of development being more accurate and timely.

The streaming updates hold importance when looking at emerging distributed processing technologies that are based on Lambda architectures. Lambda uses offline batch processing pipelines alongside real-time processing pipelines for data analytics.

Content from our partners
Sherif Tawfik: The Middle East and Africa are ready to lead on the climate
What to look for in a modern ERP system
How tech leaders can keep energy costs down and meet efficiency goals

In the end it all boils down to reducing the time to action.

Spark’s popularity has seen it progress as a viable alternative to MapReduce, which is the original data processing engine for big data analytics, although MapReduce is likely to remain in use for the time being, Spark has become the most popular technology in Hadoop.

Updates to Spark will see a high-level API attached to a Spark SQL engine which will aim to make is easier for the development of event timing. This structured streaming will support both batch and real-time analytics. The 2.0 update will particularly focus on applications that use ETL jobs.

According to the Spark developers the new version will see speed improvements of five to ten times.

Alongside the latest version of Spark, Databricks, which was founded by the creators Apache Spark, said that it is releasing the beta version of its Community Edition, a free version of the cloud-based big data platform.

The service is designed to provide users with access to a micro-cluster in addition to cluster management and notebook environment. The idea behind this is that developers will be able to use it to learn Spark without the need to set up and run their own cluster environment.

Additionally, Databricks revealed Dashboards, which is a visual reporting application for Spark clusters; this can be used to provide reports and queries.

Dashboards is an alternative view of a Databricks notebook which is aimed at end users that want to see different views of their data. Importantly, the software can be used without any Spark knowledge or access to critical code.

Topics in this article :
Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU