TensorFlow, the machine learning (ML) platform developed by Google, is about to get a version 2.0 with a slew of new features. With the open source software library used by enterprises from eBay to Intel, SAP to Twitter, many data scientists will be watching closely – and hoping it’s not too buggy!
The open source platform underpins a wide range of applications: TensorFlow’s team cites applications to forecast earthquake aftershocks, identify diseased plants and help improve customers experience. It’s not hugely easy to use however – and the new release will focus on simplicity, its team said.
This will be music to the ears of many. ML is hard: from moving models to production, to tracking which parameters, code, and data went into each experiment. Many enterprise users have built their own internal platforms (e.g. Facebook, Google and Uber with FBLearner Flow, TFX, and Michelangelo respectively) to manage data preparation, model training and deployment.
Read this: Open Source Platform Aims to Democratise “Machine Learning Zoo”
Luckily the TensorFlow team on Monday posted a fairly comprehensive overview of what will be new in the latest major release of the ML platform.
(A public preview is coming “early this year” and after 2.0’s release the team will provide 12 months of security patches to the last 1.x release, they said).
One immediate complaint from those watching though: a strong proprietary GCP focus. It’s a common gripe: as as San Francisco-based Databricks’ Matei Zaharia puts it in a blog last year: “Typical ML platforms only support a small set of built-in algorithms, or a single ML library, and they are tied to each company’s infrastructure. Users cannot easily leverage new ML libraries, or share their work with a wider community.”
All examples are for Google Cloud. Feels unfair to be promoting as a universal solution when all or almost all examples are for Google Cloud. Where in the official documentation is the Distributed/Multi-Worker Kubernetes example for a bare metal GPU cluster?
— Highly Stateful Complex Device (@marcfawzi) January 15, 2019
Tensorflow 2.0: What’s Coming
The mood music from TensorFlow is that interoperability will improve.
Updates will include easy model building with Keras and cleaning up deprecated APIs, while reducing duplication, TensorFlow’s team said.
Keras is a a high-level API for building and training deep learning models. It has implementations for TensorFlow, MXNet, TypeScript, JavaScript, CNTK, Theano, PlaidML, Scala, CoreML, and other libraries.
“With the rapid evolution of ML, the platform has grown enormously and now supports a diverse mix of users with a diverse mix of needs. With TensorFlow 2.0, we have an opportunity to clean up and modularize the platform.”
A major push by the team is on improving compatibility and parity across platforms and components by standardising exchange formats and aligning APIs, they added.
Deployment Three Ways
With TensorFlow 2.0, once you’ve trained and saved your model, you can execute it directly in your application or serve it via three deployment libraries.
These are TensorFlow Serving (a library that allows models to be served over HTTP/REST or gRPC/Protocol Buffers); TensorFlow Lite (which allows users deploy models on Android, iOS and embedded systems like a Raspberry Pi and Edge TPUs and TensorFlow.js (which lets users deploy models JavaScript environments, such as in a web browser or server side through Node.js.)
TensorFlow 2.0 brings several other new additions that allow researchers and advanced users to experiment, using rich extensions like Ragged Tensors, TensorFlow Probability, Tensor2Tensor, and more to be announced, the team said.
To simplify the migration to TensorFlow 2.0 meanwhile, there will be a conversion tool which updates TensorFlow 1.x Python code to use TensorFlow 2.0 compatible APIs, or flags cases where code cannot be converted automatically.