eBay has open sourced software designed to solve the problem of getting multiple servers to agree on a shared state even in the face of failures.
NuRaft is a lightweight C++ Raft core, released this week under the Apache 2.0 open source license, with enhancements over the cornerstone Raft core that include SLS/TLS support, asynchronous replication, and other new features.
The company described it as the “first graduate” of its efforts to create open source, cloud-native database services for its core business.
It is the result of two years-worth of testing and closely watched internal deployment.
At the very top level, it selects a server from a cluster to act as a leader. When it detects a crash, it triggers an election for another leader, which accepts commands from clients, appends them to its log and replicates its log to other servers. Only servers with up-to-date logs can become leader. For more on how Raft works, see here.
eBay Open Source Initiatives
The release is part of eBay’s NuData initiative: the company’s bid to develop and operate new cloud-native database services for eBay’s core business.
“NuRaft is the first graduate from our overall effort”, the company’s Gene Zhang and Jung-Sang Ahn wrote this week, detailing its features.
This broader effort includes a LSM-tree-based storage engine (Jungle), a replicated high-performance log store, a multi-purpose log player, a distributed transaction protocol (GRIT), a transactional graph store (NuGraph), and machine-learning-based anomaly detection and prediction (GRANO: to be demonstrated at VLDB 2019).
The two said: “Over two years ago, when we started to look for a robust, efficient data replication component, we analyzed a few open source choices.
“Given our development environment, we needed a C or C++ implementation of a consensus protocol, such as Multi-Paxos, or Raft.
“After some hands-on prototyping and evaluation using some of the options, we settled on the cornerstone C++ Raft implementation. It’s lightweight, has the least dependencies (on ASIO only), and yet is functionally complete in terms of cluster management, recovery from peers, in addition to including basic replication features. It provides an interface for log store plugin, state machine, as well as configuration parameters.”
“The NuRaft protocol requires at least three servers or virtual machines to tolerate one failure, although you can run three processes on a single machine for testing and learning purposes,” the company said on Wednesday.
“University researchers as well as developers at companies who need an efficient C++ replication protocol for their data replication or distributed log stores to support distributed transactions, as we do at eBay, can benefit from it.”