Microsoft has quietly developed and open sourced a unique Remote Direct Memory Access (RDMA)-network virtualisation toolkit that could help tackle growing latency and CPU usage issues for users running data-intensive applications (e.g., data analytics and deep learning frameworks) on containers in shared cloud environments.
The tool, dubbed “FreeFlow”, was jointly developed by Microsoft Research and Carnegie Mellon University. It aims to make fully virtualised RDMA-based networking viable for data intensive tools like Apache Spark, Hadoop and TensorFlow running on shared cloud environments, rather than solely on bare metal.
FreeFlow was slipped onto GitHub with little fanfare last summer. It reached a broader audience after being presented in an academic paper [pdf] at the USENIX Symposium on Networked Systems Design and Implementation last week.
Microsoft FreeFlow Release: Background
The software-based tool aims to tackle the growing issue of container networking bottlenecks, as containers are deployed for ever-more resource-intensive applications.
Containers are isolated execution environments on a Linux host that support their own file system, processes and network stack. They have become the de facto way of managing and deploying large cloud applications.
FreeFlow also aims to overcome common issues with using RDMA to manage containers. RDMA is a network protocol that provides direct memory access from the memory of one host to the memory of another host without involving the remote OS and CPU. (TCP/IP communications typically require copy operations, which add latency and consume significant CPU and memory resources.)
Yet as the research paper notes: “It is hard to fully virtualize RDMA-based networking. RDMA achieves high networking performance by offloading network processing to hardware NICs, bypassing kernel software stacks.”
“It is difficult to modify the control plane states (e.g.,routes) in hardware in shared cloud environments, while it is also hard to control the data path since traffic directly goes between RAM and NIC via PCIe bus. As a result… data-intensive applications that have adopted both these technologies, use RDMA only when running in dedicated bare-metal clusters; when they run in shared clouds, they have to fundamentally eschew the performance benefits afforded by RDMA. Naturally, using dedicated clusters to run an application is, however, not cost ef-ficient both for providers or for customers.”
See also: Introducing Intel Nauta: A Kubernetes-Powered Deep Learning Platform
FreeFlow, in short, uses what Microsoft’s Victor Bahl, Distinguished Scientist, Director Mobility & Networking Research, describes as “a variety of cool techniques” such as shared memory and RDMA to improve network performance—that is, higher throughput, lower latency, and less CPU overhead.
In “native” RDMA applications use RDMA APIs to directly send commands to the hardware NICs for both control and data path functions.
“FreeFlow intercepts the communication between applications and physical NICs, and performs control plane and data plane policies inside the software FreeFlow router which runs as another container on the host machine. In particular, for controlling the data path, FreeFlow router only allows the physical NIC to directly read and write from its own memory and take the charge of copying data from and to the applications’ memory”, the research team explained.
Yibo Zhu, the co-inventor of the technology adds: “One of the nice features of FreeFlow is that it works on top of popular technologies including Flannel, and Weave.”
“Containers have their individual virtual network interfaces and IP addresses. They do not need direct access to the hardware network interface. A lightweight FreeFlow library inside containers intercepts RDMA and TCP socket calls, and a FreeFlow router outside the containers helps accelerate the flow of data.”
It has been made available under an MIT licence here.
Rani Osnat, VP product marketing at container security specialist Aqua Security told Computer Business Review in an emailed statement: “As container-based applications are moving into production and growing in scale, organizations will increase the density of their deployments, which may in turn create networking bottlenecks.”
He added: “Much like security that must be adapted to handle this more dynamic, ephemeral and dense environment, networking must as well, and this RDMA virtualization looks like a step in the right direction.”
See also: Cloud Giants Team Up to Launch Kubernetes Operator Registry