10 million exposed credentials detected in open source repositories

ten million credentials exposed source code in open source repositories. (Photo by Michael Vi/Shutterstock)

Ten million lines of code on GitHub exposed private credentials in 2022, posing a growing risk to software supply chain security. This market must be regulated in order to protect companies from targeted attacks using these credentials, researchers have warned Tech Monitor. However, this process could be complicated and damaging to the source code repository ecosystem. The credentials are disturbingly easy to find for cybercriminals, with much of the information being publicly available.

Out of the 1.27 billion lines of source code analysed by cybersecurity start-up GitGuardian, one in every ten held “hard-coded secrets,” the industry term for credentials left in source code.

These figures have been growing over the last three years, a phenomenon deemed “secret sprawl” by GitGuadian’s ‘The State of Secrets Sprawl 2023′ report. Such an abundance of easily accessible credentials poses a real risk to the global software supply chain.

The presence of hard-coded secrets on GitHub increased by 67% in 2022, in comparison with the previous year. Not only does this make valuable credentials easy to access, but as source code is used many times during the design of a piece of software, credentials can spread very quickly and must be laboriously removed line by line.

“Hard-coded secrets have never been a more significant threat to the security of people, enterprises and even countries worldwide,” reads the report. “IT systems, open source and entire software supply chains are vulnerable to exploiting keys left by mistake in source code,” it states.

The credentials are available throughout open source GitHub repositories because developers put them there to ease the design process and they are easily forgotten once the software has been completed. Sprawl then occurs when other developers copy and paste this source code into their own projects, explains cybersecurity consultant at security company LCG K/S, Edvinas Urbasius:

“[Secrets sprawl] will continue to grow as a problem since a lot of developers forget that IT security aspect. They just copy and paste stuff, then leave it in the code and forget about it. That is how attacks happen; somebody slipped, making a mistake or misconfiguration,” he said in the report.

How can hard-coded secrets be abused?

Such readily available credentials pose a high risk to any company exposing themselves in this way, So much so that hackers are streamlining the process of finding them. “With the scanning tools that you have on the market now, you can automate that process,” explains Bharat Mistry, technical director UK and Ireland at security company Trend Micro.

“As long as you find a repository dump, you can then get a secondary tool to start scanning through, looking for certain pattern sizes and stuff. You can do a lot of it in one go. It’s no longer someone manually having to sift through code and work out what’s going on,” according to Mistry.

Once in the wrong hands, the credentials can be used to access any company with the faulty code in its software. “This is something we call lateral movement,” explains technical director and principal architect at the Synopsys Software Integrity Group Michael White. “This means that once a code repo is compromised, perhaps by simply breaking into a single developer’s computer, it may yield access to hosting servers, databases, and other systems.”

Last year’s high-profile attack on global ride-share app Uber was implemented using hard-code secrets. An attacker breached Uber and used hard-coded admin credentials to log into the firm’s privileged access management platform Thycotic, states the report. The cybercriminals pulled a full account takeover on several internal tools applications, resulting in the hackers gaining access to Uber’s AWS instance, as well as the controls of its security platform HackerOne.

December’s cyberattack on DevOps company CircleCI saw an attacker leverage malware deployed to a CircleCI engineer’s laptop to steal a valid, two-factor authentication-backed single sign-on session. They could then exfiltrate customer data, including customer environment variables, tokens, and keys.

How can this be fixed?

The issue of hard-coded secrets is so widespread that cleansing open-source repositories of private credentials will be at best time-consuming, at worst impossible. The problem is that the market is not regulated, continues Mistry: “Where you’ve got the software supply chain, that regulation isn’t there, especially if it’s open source,” he says. Implementing such regulation may cause damage to the ecosystem, however. “I don’t know how you could enforce something like that or what you would lose if you tried. Would you compromise agility? Would you compromise innovation?” he asks.

There must be a new system set in place, however, as currently companies are too insecure, Mistry concludes. “It’s a really difficult one to call, but it does require some kind of vetting process. It’s like a wildfire. Once [the credentials] are out there, they’re really difficult to change or get back,” he says.

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

How can hard-coded secrets be abused?

How can this be fixed?

Read more: AI coding assistants leave devs ‘deluded’ about quality of software

Sign up for our regular news round-up!

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing