Open source is hot. Big data is hot. Proto-unicorn data management and learning company Cloudera is open source, big data and hot, hot, hot, recently announcing its plans to go public and filing an S-1 prospectus with the U.S. Securities and Exchange Commission on March 31st. If it can hold its estimated valuation, Cloudera would be the second open source unicorn IPO this year (the first was Mulesoft).
There are 195 mentions of “open source” in Cloudera’s S-1 filing, unsurprisingly since Cloudera is built on the open source Hadoop software library. Over a third of the “open source” references of the Cloudera S-1 are in its “Risk Factors” section, including caveats around Hadoop’s Apache license. Several articles appeared immediately after Cloudera’s S-1 filing focused on its use of open source, including one arguing that Cloudera’s prospectus read like an argument against building a business on open source.
Is Open Source Use Risky Business?
Open source use is ubiquitous worldwide. Recent research by Forrester points out that open source comprises 80% to 90% of the code in a typical application. Open source use is pervasive across every industry vertical, including the space that Cloudera occupies – big data – where open source makes up nearly 40% of that vertical’s commercial applications. Open source dominates for good reason. Like Cloudera, organisations across every industry vertical build applications using open source as their foundation because open source decreases development costs while accelerating time to market.
However, open source risk does accompany its benefits, particularly when organisations do not sufficiently track and manage the open source they have in use. And therein lies the key. Most organisations don’t do an effective job in tracking or managing open source, exposing them to several risks that Cloudera notes in its S-1. Let’s take a look at three of them…
A Risk of IP Infringement
“We may be exposed to increased risk of being the subject of intellectual property infringement claims as a result of acquisitions and our incorporation of open source software into our platform, as, among other things, we have a lower level of visibility into the development process with respect to such technology or the care taken to safeguard against infringement risks.”
Indeed, all companies using open source face IP infringement risk. Given that open source is the foundation of modern software, you could make the case that every company with mission-critical applications face the risk that Cloudera specifies. It’s why savvy companies – I assume Cloudera among them since they acknowledge the fact – have processes in place to identify and manage open source to mitigate risk as new code enters the SDLC through acquisitions.
A Risk of Exposure of Proprietary Code
“By the terms of certain open source licenses, we could be required to release the source code of our proprietary software, and to make our proprietary software available under open source licenses, if we combine our proprietary software with open source software in a certain manner.”
While it is true that some open source license terms could require proprietary code be released as open source itself, with Cloudera acknowledging the issue indicates that it has processes in place to ensure a doomsday scenario doesn’t happen.
A much scarier scenario would be the company that doesn’t realise the requirement to comply with the licenses of the open source they use – or worse, doesn’t even realise that they have open source with licensing conditions in its proprietary code. Most open source components are governed by one of about 2,500 known open source licenses, and the license obligations can be tracked and managed only if the open source components themselves are identified.
Open source components with no identifiable license terms are also problematic. Software that does not have a license generally means no one has permission from the creator(s) of the software to use, modify, or share the software. Creative work (which includes code), is under exclusive copyright by default. Unless a license specifies otherwise, nobody else can use, copy, distribute, or modify that work without being at risk of litigation. Lack of clear statements of rights and obligations leaves organisations at greater risk of violation of “hidden” terms.
A Risk from Cyberattack
“Further, some open source projects have known vulnerabilities and architectural instabilities… While we have established processes intended to alleviate these risks, we cannot assure that these measures will reduce these risks.”
As well as license compliance, security vulnerabilities and code quality are of concern in open source – as they are in proprietary software. Over 3,600 new open source component vulnerabilities were reported in 2016 – almost 10 per day on average.
Known vulnerabilities in open source are particularly attractive to attackers. These vulnerabilities (and often their exploits) are publicly disclosed, and users are often completely unaware of their use of the components themselves, much less the vulnerabilities within.
“[W]e have established processes intended to alleviate these risks.” indicates that Cloudera recognises the importance of open source vulnerability management. Indeed, Cloudera apparently feels secure enough with their insight into the open source they use to provide warranties to customers and partners protecting against those risks.
Not so Risky a Business
Cloudera obviously understands the risks as well as the benefits associated with open source, and seems prepared to handle those risks with open source identification and management. Rather than its use of open source, potential investors should be concerned that Cloudera’s historical losses have overwhelmed the company’s revenue to date, and that Cloudera “expects to continue to incur net losses for the foreseeable future.” Cloudera also faces a bevy of heavily armed competitors such as HP, IBM, Oracle, Amazon Web Services and Hortonworks. That is risk much more difficult to manage than the risk potential of open source code.