Cloud data and artificial intelligence (AI) specialist Databricks has announced it will contribute all the enhancements it has made to its Delta Lake platform to the open source community. Databricks was born from open source software, and making more of its features available is seen as a sensible move as the importance of open standards to storing and processing of cloud-based data sets grows.

Databricks is making its Delta Lake platform fully open source (Photo by Gabby Jones/Bloomberg via Getty Images)

Announced at the Data + AI summit today, the move will see Delta Lake handed to the open source Linux Foundation, while all Delta Lake APIs will also be open-sourced as part of the Delta Lake 2.0 release. The company also announced a new release of its MLflow machine learning platform, which includes MLflow Pipeline, a new feature to simplify the build of machine learning models.

“From the beginning, Databricks has been committed to open standards and the open source community. We have created, contributed to, fostered the growth of, and donated some of the most impactful innovations in modern open source technology,” said Ali Ghodsi, co-founder and CEO of Databricks.

“Open data lakehouses are quickly becoming the standard for how the most innovative companies handle their data and AI. Delta Lake, MLflow and Spark are all core to this architectural transformation, and we’re proud to do our part in accelerating their innovation and adoption.” 

What is Databricks?

Founded in 2013, Databricks was formed by Ghodsi and his co-founders to commercialise the Apache Spark engine, an open source data analytics engine. The company pioneered the idea of the so-called ‘data lakehouse’, which combines the structured data storage of a data warehouse with the unstructured data storage of a data lake. This combined method makes it easier to use the data for deploying machine learning models.

The concept has proved popular, and Databricks is used by more than 7,000 organisations worldwide, the company says, including 40% of the Fortune 500. It generates revenue by selling subscriptions to analytics tools which are used on top of Delta Lake, and reported revenue of $425m for the year to September 2021.

Making Delta Lake 2.0 completely open source seems “the natural next step” for Databricks, says Beatriz Valle, senior technology analyst at GlobalData. She says the move represents a “highly coherent long-term strategy for its future identity and roadmap as a company”.

“Databricks wants to demonstrate that it is staying faithful to its open source origins,” she added.

The cloud AI and ML market for enterprise is hotly contested, Valle says. Today also saw HPE and Red Hat announce a collaboration that will see Red Hat’s open source-based tools, which include AI and ML options, made available on the HPE GreenLake ecosystem, which helps clients manage their cloud and edge deployments and includes a data lake platform. With this in mind, it makes sense for Databricks to underscore its roots in open source projects.

“Open source platforms are becoming entrenched and part of the enterprise architecture as AI becomes part and parcel of enterprise strategies,” Valle adds.

MLflow pipeline could help Databricks keep its edge

The announcement of the new MLflow pipeline is also an interesting one, Valle says. It offers a series of pre-defined templates for different types of ML model, allowing non-technical users to more easily set up their own models.

Databricks “is going headlong into full automation mode,” Valle explains. The trend towards ‘democratising AI’ through low code systems, the desire to put these technologies in the hands of regular business users who don’t have significant technical knowledge, is likely to be a driver for this.

She says the benefits such a system will bring could help Databricks keep an edge over its rivals. “A large number of companies are finding it challenging to move from data insights to actions, to render revenues,” Valle explains. “Too often insights aren’t easily consumed by those making day-to-day operational decisions, and line of business users must have access to insights when they need them, and in an easily consumable format, to make data-informed decisions.

Valle adds: “With announcements like MLflow Pipelines, these users can leverage AI to extract value from the wealth of data at their disposal in an automated manner.”

Tech Monitor is hosting the Tech Leaders Club on 15 September. Find out more on NSMG.live

Read more: Microsoft reveals Power Platform low code updates