Machine learning unicorn Databricks has donated its Apache 2.0 licensed “Delta Lake” product to the Linux Foundation. Delta Lake is a “production-ready” open source tool designed to provide data lake reliability for both batch and streaming data.
It aims to tackle data reliability challenges in data lakes by making transactions ACID compliant, enabling concurrent reads and writes.
Since its launch in October 2017, it has been adopted by over 4,000 organisations, including Intel, and processes over two exabytes of data each month, Databricks said today, announcing the donation of the technology to the Foundation.
Databricks was founded by the creators of widely used open source software Apache Spark and the move represents its ongoing commitment to the open source community, CEO and co-founder Ali Ghodsi said today, adding that he was confident that Delta Lake “will quickly become the standard for data storage in data lakes”.
Ghodsi said: “Our team has continued to create and contribute to open source projects because we know it is the fastest, most comprehensive way to innovate.”
See also: Databricks’ CEO Ali Ghodsi on Microsoft, “Mumbo-Jumbo”, and the Magic of Merging Data Teams
The Linux Foundation has over 1,000 members, including AT&T, Cisco, Fujitsu, Hitachi, Huawei, IBM, Intel, Microsoft, NEC, Oracle, Qualcomm, and Samsung. Among the projects it looks after on top of Linux itself, are Kubernetes, Linkerd and Hyperledger).
Delta Lake is now part of the Linux Foundation! EBs of data/month, in production 1000s of organizations. Can't wait to see how the community will shape its future and establish it as a standard for data lakes.https://t.co/NeZ2TVaeeA
— Reynold Xin (@rxin) October 16, 2019
In a release today the company, which recently expanded in the UK, touted support for the move from Alibaba, Booz Allen Hamilton, and Intel.
“We have been working with Databricks on a native Hive connector for Delta Lake on the open source front, and we are thrilled to see the project joining the Linux Foundation. We will continue to foster and contribute to the open source community, said Yangqing Jia, VP of Big Data and AI at Alibaba.
Databricks Also Releases “Model Registry”
In other Databricks news, the company has released a new tool called Model Registry, which adds capabilities to its open source machine learning lifecycle tool MLflow (which sees a huge 800,000 monthly downloads).
“Everyone who has tried to do machine learning development knows that it is complex. The ability to manage, version and share models is critical to minimizing confusion as the number of models in experimentation, testing and production phases at any given time can span into the thousands,” said Databricks CTO Matei Zaharia.
“The new additions in MLflow, developed collaboratively with hundreds of contributors, are enabling organizations worldwide to improve ML development and deployment. With hundreds of thousands of monthly downloads, we are encouraged that the community’s contributions are making a positive impact.”
The Model Registry component is available through GitHub.