“We were just researchers who wanted to revolutionise the world by democratising machine learning” – The way he tells it, Databricks CEO Ali Ghodsi didn’t set out to help create one of Silicon Valley’s hottest companies, build a one-of-a-kind partnership with Microsoft or make a tonne of money.
He’d helped create some open source software he really liked and he was trying to give it away. In 2009, even that was a struggle: the open source community was disinterested, and when he tried to give it away to vendors, they said they wanted enterprise software; not “academic mumbo-jumbo”.
Less than a decade later, with Databricks seeking a valuation of $900 million for its last funding round, 500+ major corporate users and a unique relationship with Microsoft, Sweden-educated Ghodsi would be forgiven for a touch of hubris. Instead, sitting down for a coffee in London’s Holborn, he’s a self-effacing raconteur of his company’s story.
The (Apache) Spark of an Idea
The seed of Databricks was planted in 2009 with the creation of Apache Spark, the open source unified analytics engine, which Ghodsi helped develop. (eBay and Netflix are among those now deploying it at massive scale to process petabytes of user data.)
Nine years ago however, commercialisation of Spark wasn’t the developer team’s priority.
As Ghodsi puts it: “I was at UC Berkeley, a public school where funding had dried up from the government. Our department was very well funded by Silicon Valley though. We could see that Google and Facebook were doing something very different to the open source community: they were doing machine learning on massive amounts of data. But most of their projects were pretty secretive. Our mission was: ‘Let’s build and open source this stuff and give it to everyone in the world!’”
He adds wryly: “We built the software (Spark), put it on a few web pages and said: ‘Hey download this!’ No one downloaded it. So we went to existing Hadoop vendors and many tech companies and said: ‘Hey can you adopt this technology?’ They all told us ‘no: this is academic mumbo-jumbo. We need enterprise software.’ We literally wanted to give it away and no one would take it.”
Enter Ben Horowitz
By 2013, with still too few people paying attention, Ghodsi and team decided to try a different tack and start a company around the software.
“This being Silicon Valley the venture capitalist Ben Horowitz happened to be close, so we asked him for $200,000 to go away, code for a year and build this thing. He offered $14 million, saying he wanted to be involved personally, and so we spent a year out there evangelising and trying to drum up attention.”
The result was Databricks, which raised $140 million in a funding round last August led by VC firm Andreessen Horowitz (the company had reportedly sought a valuation in the ballpark of $900 million at the time)
So what is Databricks’ secret sauce?
“The tech is pretty simple”, Ghodsi says self-effacingly. “It does traditional SQL, ETL [extract, transform and load] like Hadoop does; machine learning and real-time streaming. It’s a Swiss army knife. But we have the best monetisation strategy for open source that exists on the planet.”
He adds: “The Red Hat model for commercialising open source – training, services and support – is human-heavy; churn is high and margins aren’t great. We saw people buying RDS from Amazon [to host and manage my SQL databases] even though they could have just installed EC2, which is open source and all RDS really is. People prefer convenience though, so we essentially provide Apache Spark in the cloud as a managed service”.
Enter Satya Nadella
A one-of-a-kind relationship with Microsoft Azure, facilitated by Ben Horowitz – who enjoys a close relationship with Microsoft CEO Satya Nadella – helped bring Databricks to the big time and as of this spring, make its offering (a toolbox of machine learning solutions and more) the only first party service hosted by Microsoft Azure .
“You can now buy Microsoft Azure Databricks. We have nothing to do with that. It sounds like an OEM deal – but we’re operating the service; running it and writing the source code. They’re selling it and because 77 percent of the Fortune 2,000 have enterprise agreements with Azure, it’s given us great reach.”
The relationship also allows Databricks to offer a deeper level of compatibility with Microsoft’s first-party offerings than other services offered through the Azure Marketplace. (Databricks is also available on AWS.)
What is Machine Learning still a Struggle for Companies?
With AI and ML now hotter than ever, what’s the plan? As Ghodsi sees it, many companies are still struggling to get ML of the ground.
He told Computer Business Review: “We’re looking at why many companies, unlike the Googles and the Facebooks, are not succeeding. A lot of the time it’s down to the divide between data science and data engineering. The data engineers need their data to be secure; they’re concerned about breaches and GDPR and so on. The data scientists doing the math and modelling need the data. Projects are getting stuck between departments.”
He adds: “So we simply changed the architecture so you can directly programme against the data that exists, so you can start doing iterations over the data or built the models on that data which you couldn’t do on data warehouses – that’s unified analytics. Not just SQL analysis. Not just ML or predictive. It’s a unification of data engineering and data science. We think that’s super exciting: giving both teams the same software so they can speak the same language.”
“In fact” he adds, “while this is a bit controversial, I think you should merge those teams and have people do the engineering and the science. Actually that’s happening. You’ll see a lot more adverts around not for data scientists or data engineers, but machine learning engineers.”
With that, London’s Rosewood Hotel need their table back and Ghodsi – still trying to democratise machine learning, but now driving serious revenues from doing so – is off to continue preaching the power of open source “unified analytics”.
For those seeking to learn more about how data can unlock new business opportunities, CBR Events’ Data Connect Conference on September 13 will focus on how enterprises can embrace data-driven growth strategies. For details of speakers, sponsors and to register, please click here.