View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data
February 19, 2019updated 15 Jul 2022 5:37am

Software Code’s “Wayback Machine” Gets a Boost

"The lack of Software Intelligence around open source versioning and licensing puts many companies in danger of losing valuable IP"

By CBR Staff Writer

Call it the Wayback Machine of code: a searchable open archive of software source code across iterations; from buggy beta versions, to sophisticated contemporary release.

Software Heritage is a non-profit initiative developed and hosted by the French Institute for Research in Computer Science and Automation.

Officially created in 2015, the project has been growing over the years. It now spans 5.6 billion source files from more than 88 million projects.

Software Heritage is itself built on open-source code. It gathers source files by trawling through repositories that developers uses to create and share code, such as Github, Gitlab, GoogleCode, Debian, GNU and the Python Package Index, with users able to trace detailed revision history of all the codebase versions that it stores.

Software Heritage: Adds “Provenance Index”

Now the non-profit – which is backed by industry leaders including  Intel, Google and Microsoft – has announced a new partnership with Paris-based software company CAST that will see the two create a “provenance index” of code.

The provenance index enables users on the Software Heritage platform to search for the original occurrences of any given source file.

For the curious, that will allow “unprecedented insight” into software evolution, the two believe. CAST, however, is also keen to emphasise the opportunity afforded by plugging Software Heritage into the company’s paid for application portfolio analysis and software composition analysis tool CAST Highlight.

Content from our partners
Green for go: Transforming trade in the UK
Manufacturers are switching to personalised customer experience amid fierce competition
How many ends in end-to-end service orchestration?

Software Composition Analysis is Getting Important 

CAST added software composition analysis to this tool in 2018, when it bought Antelink, a “knowledge base” of open source components founded by Inria, the public science and technology institution dedicated to computer science learning.

Plugging the two tools together will provide rapid identification of third-party source code across more than five billion known source code files, enabling better detection of external code, license risks and vulnerabilities, CAST said.

With Software Heritage containing information about known application security vulnerabilities in addition to copyrights for all known software in use, CAST describes its searchability as crucial where a Bill of Materials is required.

This might include occasions like when an enterprise is outsourcing software development, buying software assets or during a merger or acquisition.

(A software bill of materials is a list of components in a piece of software; vendors often create products that bring together open source and commercial software components).

“The lack of Software Intelligence around open source versioning and licensing puts many companies in danger of losing valuable IP, as most executives are unaware of their risk exposure,” said Vincent Delaroche, Founder and CEO at CAST.

“Business leaders should be aware when open source and other external components in code expose their organization to non-compliance, legal action and possible loss of proprietary IP,” he added.

Software Heritage’s project developers meanwhile have given themselves just one clearly defined mission: “We are committed to collect, index, preserve and make easily accessible the source code of the software that lies at the heart of our culture,” they state on their website. Those curious can drag and drop source code files (.c, .java, .py, …) or enter their SHA1 into its search bar here.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.
THANK YOU