In modern software development we rely on hundreds, sometimes thousands of different building blocks. The glue that connects all the different building blocks are collectively known as secrets. These are typically API keys, credentials, security certificates and URIs. These are the modern day master keys. They can provide access to cloud infrastructure, payment systems, internal messaging and user information to name a few. Once an attacker has a secret, they can move laterally between systems to uncover additional information and secrets, and because they are authenticated, they look and appear like valid users, making it extremely difficult to detect. (Read More)
But even having established how sensitive these secrets are and why they should be tightly wrapped, this next statement may surprise you:
These secrets are sprawled all over the internet, sitting in code repositories in public view.
For the proprietor of the code, these secrets are difficult to identify, but malevolent actors out to find them have developed simple and effective tools to uncover secrets deeply buried and long forgotten in git history.
There are plenty of articles, whitepapers and blog posts on the importance of protecting secrets, for example Hashicorp and GitGuardian have great resources on this topic. Instead, I want to focus on the different tools available for detecting secrets as well as their pros and cons. But of course it is up to you, the reader, to decide which tools will be best to protect your secrets.
Three options for secrets detection
When it comes to secrets detection, you can choose between 3 different approaches:
- Building a custom solution in house
- Using open-source projects
- Using commercial products
Let’s run through a few examples.
Building in house detection
For some of us, the problem of secret sprawl poses a perfect problem to unpack. I would be lying if I haven’t played myself with building some fun regular expression (regex) scripts to detect sensitive strings inside code. But building a comprehensive reliable secrets detection script is a huge task.
First, you need to decide how to detect secrets. There are two main options for this: using regex to detect fixed string credentials (like Stripe API keys which begin with the same characters), or implement high entropy detection, which casts a large net but brings back a huge volume of results.
When using regular expression, you have a very limited scope of secrets you can detect leaving you open to vulnerabilities. Using high entropy method, you will cast a wider net but also need to sort through more false positives. Of course, in an ideal world you want to use both, but then you'd need to build in post-validators that can sift through the results to exclude likely false positives.
If you are building this as an experiment for your personal projects, this can be a fun and exciting challenge. But when you bring in the challenges of detection at scale, you have to consider resources, alerting and mitigation. The challenge can quickly spiral into a huge project.
It is always best to first learn from a real-life example, I would encourage anyone going down the path of building a secrets detection solution to first read about how SAP built its internal secrets detection solution.
If you are fixed on building a personal solution, I would have to advocate for beginning with one of the many open-source projects available to build upon. I know this can be less exciting than a personal challenge, but when you begin to unpack the scope of the problem, it will save you a ton of work.
Using open-source tools
Open-source tools are not just a good starting point for building your own custom decision patterns, but there are actually also great projects available that provide immediate value with minimal setup.
Popular open-source tools
There is a huge list of open source detection tools available on GitHub. Below are a few that are both popular and well-maintained.
Pros and cons of open-source tools
While the detection reliability and efficiency vary between solutions, the detection systems all lack enterprise features such as alerting, audit trails and in-depth investigation.
Open-source solutions, in my opinion, are best used for bug bounty and one-off pen testing exercises where high volumes of positive results can be sorted through and evaluated. When these systems are put in place in regular production, particularly within organization, the results can be overwhelming and extremely restrictive to the workflow. That being said, there are still some clear advantages over commercial systems in some situations.
Using commercial tools
Along with many high profile cases of secrets being discovered inside git repositories including Uber, many vendors have come to the party with solutions to combat this.
From the many conversations around secrets detection, the biggest concern is vendor trust. You are essentially allowing a third party to find and detect the most sensitive information that you or your organization own.
Many vendors, including the big players like GitGuardian, do offer an on premise version of their products. But this comes usually with an enterprise license which is costly for developers and smaller companies.
The idea of allowing a third party to scan for secrets inside source code can be concerning, and there are definitely some considerations to take into account. The first is that secrets inside git repositories, private and public, should already be considered compromised. Git provides the perfect platform to facilitate secret sprawl, because code is a leaky asset and git provides no audit log of who has access to it or where it has been cloned. So if secrets exist in code repositories, using a third-party application to scan for them, does not really increase the risk vector.
Commercial vendors also have larger teams and time dedicated to detecting secrets, making them more reliable in large scales but also offers additional enterprise features such as alerting, dashboards to allow investigation and remediation, as well as much easier set-up. All this means that the tool will fit into your workflow much better.
The best example of a comparison between the two most predominant open-source and commercial vendors can be found here.
Commercial secrets detection solutions
While there are additional vendors in the market, below are the four core competitors in the space that are the current market leaders.
Pros and cons of commercial secrets detection
Wrap up
Implementing secrets detection should always be part of the threat mitigation strategy of all developers and organisations. There are many available solutions on the market for both open-source and commercial vendors, all with their own considerations. While commercial vendors offer more sophisticated detection without buying commercial licenses, they come with the consideration of needing to provide third-party access to source code. Although open-source solutions are a cost-effective solution, they can provide such a large number of false positives they become prohibitive to workflow. Or you can build your own, but beware of the big task ahead of you. But in the end, it comes down to what works best for you and your organisation.
Additional Resources
To continue on this topic, check out the links below.
Secrets Detection Learning Center (GitGuardian)
How to securely use secrets (HashiCorp)
Academic research paper on secrets inside git repositories
Understanding Secret Sprawl (GitGuardian)