A major new Open Source census has identified the Top 20 most commonly used free and open source software (FOSS) components in production applications.
The Linux Foundation/ Laboratory for Innovation Science at Harvard (LISH) “Census II” report, published this week, represents what it describes as the “first steps toward addressing the structural issues that threaten the FOSS ecosystem.”
What “Structural Issues”?
The report aims to examine the risk of vulnerabilities in these projects due to widespread use of outdated versions; understaffed projects; and existence of known security flaws. (As the list reveals, many are only sporadically updated).
It comes amid growing concerns in some quarters about the “back-dooring” of open source software code bases, following several recent such attacks.
(Most famously, a malicious actor gained publishing rights to the event-stream package of of a popular JavaScript library and then wrote a backdoor into the package. In July 2019, a Ruby developer’s repository was also taken over and code back-doored.)
Jim Zemlin, executive director at the Linux Foundation said: “The report begins to give us an inventory of the most important shared software and potential vulnerabilities and is the first step to understand more about these projects so that we can create tools and standards that results in trust and transparency in software.”
He added: “Open source is an undeniable and critical part of today’s economy, providing the underpinnings for most of our global commerce. Hundreds of thousands of open source software packages are in production applications throughout the supply chain, so understanding what we need to be assessing for vulnerabilities is the first step for ensuring long-term security and sustainability of open source software.
Software Bill of Materials
It also comes as the US federal governments looks to create a Software Bill of Materials that will require all industries to detail the composition of their software systems.
The census authors note: “There is far too little data on actual FOSS usage. Although public data on package downloads, code changes, and known security vulnerabilities abound, the view on where and how FOSS packages are being used remains opaque.
“Accurate project identification impacts not only academia, but the private sector as well. As cyberattacks and security breaches increase, all companies—not just Big
Tech—will need to become more cognizant of which components comprise their websites and applications, as well as the origins of those components.”
Open Source Census: The Top 10 FOSS Components in Production Applications
Here are the Top 10 most-used FOSS packages*, listed in alphabetical order. (Titles are hyperlinked to repositories). With these dominated by JavaScript-related packages, the open source census also compiled a non-JS-dominated list, see at bottom.
1: async
A utility module which provides functions for working with asynchronous JavaScript.
2: inherits
A browser-friendly inheritance fully compatible with standard node.js inherits.
3: isarray
This is Array for older browsers and deprecated Node.js versions.
4: kind-of
Get the native JavaScript type of a value.
5: Iodash
Another modern JavaScript utility library.
6: Minimist
This module is the guts of optimist’s argument parser.
7: Natives
Do stuff with Node.js’s native JavaScript modules.
8: QS
A querystring parsing and stringifying library with some added security.
9: Readable-Stream
Node.js core streams for userland.
10: String-Decoder
Node-core string_decoder for userland.
How Were These Identified?
The research tapped public data sets and private usage data by Software Composition Analysis (SCAs) and application security companies, including Snyk and Synopsys Cybersecurity Research Center (CyRC), in partnership with the Linux Foundation’s CII to produce the list, with the SCA partners providing data from automated scans of production systems within their customers’ environments.
The most used, non-JavaScript FOSS packages among those reported in the private usage data contributed by SCA partners.
The non-JavaScript FOSS packages Top 10
1: com.fasterxml.jackson.core:jackson-core
A core part of Jackson that defines Streaming API as well as basic shared abstractions.
2: com.fasterxml.jackson.core:jackson-databind
A general data-binding package for Jackson (2.x): works on streaming API (core) implementation(s).
3: com.google.guava:guava
Google core libraries for Java.
4: commons-codec
Apache Commons Codec (TM) software that provides implementations of common encoders and decoders such as Base64, Hex, Phonetic and URLs.
5: commons-io
Commons IO is a library of utilities to assist with developing IO functionality
6: httpcomponents-client
The Apache HttpComponents project is responsible for creating and maintaining a toolset of low level Java components focused on HTTP and associated protocols.
8: logback-core
A generic logging framework for Java.
9: org.apache.commons:commons-lang3
A package of Java utility classes for the classes that are in java.lang’s hierarchy, or are considered to be so standard as to justify existence in java.lang
10: slf4j:slf4j
A simple logging facade for Java.
“FOSS was long seen as the domain of hobbyists and tinkerers. However, it has now become an integral component of the modern economy and is a fundamental building block of everyday technologies like smart phones, cars, the Internet of Things, and numerous pieces of critical infrastructure,” said Frank Nagle, a professor at Harvard Business School and co-director of the Census II project. “Understanding which components are most widely used and most vulnerable will allow us to help ensure the continued health of the ecosystem and the digital economy.
The full Linux Foundation report can be read here [pdf].
* A unit of software that can be installed and managed by a package manager — in turn, defined as “software that automates the process of installing/managing packages.”