View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. AI and automation
June 5, 2018updated 06 Jun 2018 12:36pm

Open Source Platform Aims to Democratise “Machine Learning Zoo”

Toolkit released today is built around REST APIs; designed to work across ML libraries, algos, deployment tools or languages

By CBR Staff Writer

Machine learning (ML) is hard and it’s messy. It’s hard to move models to production, due to a diversity of deployment environments; it’s hard to track which parameters, code, and data went into each experiment to produce a model and it’s generally something Talked About more than Done in most businesses.

As a result, Big TechTM has been building internal machine learning platforms to manage the ML lifecycle. Facebook, Google and Uber, for example, have built FBLearner FlowTFX, and Michelangelo respectively to manage data preparation, model training and deployment in contained environments.

Even these, as San Francisco-based Databricks’ Matei Zaharia puts it in a blog today, are limited: “Typical ML platforms only support a small set of built-in algorithms, or a single ML library, and they are tied to each company’s infrastructure. Users cannot easily leverage new ML libraries, or share their work with a wider community.”

The Keys to the ML Castle?

San Francisco-based data streaming specialists Databricks has a track record of trying to democratise difficult tech (see our recent interview with CEO Ali Ghodsi) and today it’s releasing not just that pleasingly lucid blog, but the alpha version of a new open source, cloud-agnostic toolkit designed to simplify ML workflow.

The toolkit, called MLFlow, allows organisations to package their code for reproducible runs and execute hundreds of parallel experiments, across any hardware or software platform. It integrates closely with Apache Spark, SciKit-Learn, TensorFlow and other open source ML frameworks.

Releasing the tool today at the Spark + AI Summit in San Francisco, CEO Ali Ghodsi said: “To derive value from AI, enterprises are dependent on their existing data and ability to iteratively do machine learning on massive datasets.  Today’s data engineers and data scientists use numerous, disconnected tools to accomplish this, including a zoo of machine learning frameworks.

He added: “Both organizational and technology silos create friction and slow down projects, becoming an impediment to the highly iterative nature of AI projects. Unified Analytics is the way to increase collaboration between data engineers and data scientists and unify data processing and AI technologies.”

Content from our partners
Why all businesses must democratise data analytics
How start-ups can take the next step towards scaling up
Unlocking the value of artificial intelligence and machine learning

Engineering giant Bechtel is an early customer, the company said, with Bechtel’s principle Big Data architect, Justin Leto saying it “provides our data scientists with usable data, and keeps our engineers focused on AI solutions in production instead of troubleshooting ops issues.”

As Zaharia puts it: “MLflow is designed to work with any ML library, algorithm, deployment tool or language. It’s built around REST APIs and simple data formats (e.g., a model can be viewed as a lambda function) that can be used from a variety of tools, instead of only providing a small set of built-in functionality. This also makes it easy to add MLflow to your existing ML code so you can benefit from it immediately, and to share code using any ML library that others in your organization can run.”

Did somebody mention GitHub? Here’s the MLFlow repo.

 

 

Websites in our network
NEWSLETTER Sign up Tick the boxes of the newsletters you would like to receive. Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
I consent to New Statesman Media Group collecting my details provided via this form in accordance with the Privacy Policy
SUBSCRIBED
THANK YOU