The in-memory analytic capabilities of Spark are being used to advance the Hortonworks Data Platform.
The latest version of Apache Spark, 1.5.2, will appear on the HDP platform and include support for Spark SQL and Spark Streaming. The idea behind this is to help provide a clear path to using Spark, Hortonworks said.
Benefits of the inclusion include being able to deploy Spark-based applications alongside Hadoop workloads, something that will help increase the scale of Spark.
Hortonworks has identified three main areas of focus for Spark, which includes; data science acceleration, seamless data access and innovation at the core.
Data science acceleration aims to improve productivity by enhancing Apache Zeppelin and by contributing Spark algorithms and packing to ease the development of solutions.
The data access element is being addressed by improved integration between Spark and YARN, HDFS, Hive, HBase and ORC. Further optimisation is being done via the Data Source API.
In addition to the Spark integration, Hortonworks has also created a collaboration destination for developers, DevOps, customers and partners to get answers to questions, to collaborate and to share code examples from GitHub, called the Hortonworks Community Connection.