New research: Graphing misconfigurations and vulnerabilities to visualize blast radius

We recently collaborated with our friends at Unit 42 to dive deeper into the world of software supply chain attacks and the role misconfigurations and vulnerabilities play in them. The research, collated in the Unit 42 Cloud Threat Report, 2H 2021, combines findings from a red team exercise and data gleaned from Bridgecrew’s open-source research tools.

Watch the session on the findings below or keep reading for some highlights from that data.

Terraform security highlights

Using Checkov, we analyzed over 4k Terraform templates and 38k Terraform files in popular open-source Terraform repositories for misconfigurations. We found:

  • 63% of the scanned templates contained one or more insecure configurations.
  • 49% of the templates contained at least one critical or highly insecure configuration.
  • 64% of the downloaded templates by volume contained at least one high or critical insecure configuration.

Read the full report to see the most commonly found misconfigurations across each cloud provider and more.

Kubernetes Helm

Using Helm Scanner (and in the same vein as our previously published Helm security research series), we analyzed over 3k Helm charts and 9k YAML files on Artifact Hub for misconfigurations. We found:

  • 99.9% of all Helm charts contained misconfigurations.
  • For charts with dependencies, 62% of the misconfigurations come from the dependent charts.
  • 92% of the misconfigurations were in the dependent charts.

Read the full report to see the most common and concerning misconfigurations.

Container images

In addition to leveraging our homegrown tools, we extended our Helm Scanner to also scan Helm charts’ containers for CVEs using twistcli’s container scanning capabilities via Checkov. This allowed us to analyze data for CVEs in addition to misconfigurations within a given Kubernetes Helm chart. We found:

  • 96% of the images contained known vulnerabilities.
  • 91% of the images contain at least one critical or high vulnerability.

All of these datasets were leveraged in the report to provide insights into the state of infrastructure and container security which you can find within the Unit 42 Cloud Threat Report, 2H 2021: Secure the Software Supply Chain to Secure the Cloud.

But our research didn’t end there.

While analyzing the data, we realized that we could take these individual datasets even further.

CVE, meet misconfig

While infrastructure misconfigurations can be extremely dangerous on their own (think exposed S3 bucket or IAM policy allowing account access), most successful attacks build on multiple weaknesses. Whether an attack starts with a misconfiguration or a vulnerability, each bug, CVE, and misconfig allow for another move or pivot toward the attackers’ target.

For example, consider the following misconfiguration-first attack:

An infrastructure misconfiguration provides an attacker the first puzzle piece—an open security group on a load balancer that accidentally exposes an internal monitoring endpoint to the world. The endpoint, which was never supposed to be external, hasn’t been given as much attention as public-facing application components, so a CVE has gone unpatched allowing container/shell access to the attacker.

At this point, another misconfiguration—not stripping default Kubernetes service accounts from a deployment—provides the attacker a token to speak with the Kubernetes API powering the infrastructure. This, in turn, allows the exploration and lateral movement into other resources or to attack the Kubernetes APIs themselves.

Now consider a CVE-first attack:

You’ll see a similar sequence of events, but the initial step happens to be a zero-day exploit of the application code. The CVE successfully gets the attacker a shell within the container running the vulnerable code, which could be the end of the attacker’s access. In this case, however, the infrastructure misconfigurations allow the attacker to make a lateral move.

First, just as in our previous example, default service account tokens allow the attacker to enumerate the Kubernetes APIs. This determines what level of access that default service account has and potentially enables them to access other pods, deployments, or secrets in the cluster.

At the same time, a misconfigured security policy (or no micro-segmentation security policies at all) allows good old-fashioned network enumeration of other—not necessarily publicly exposed—services with weaker security configurations such as our monitoring endpoints.

We can see in both scenarios how important it is to consider both misconfigurations and CVEs as they can often be used together for an attack—regardless of the initial path of compromise.

The complexities of understanding signal vs. noise in security datasets

Analyzing the same Helm chart dataset used in the Cloud Threat Report, we found that chained misconfigurations and vulnerabilities were largely pervasive. Here are some highlights from our findings:

  • Every single Helm chart that contained a CVE within our dataset also contained an infrastructure as code misconfiguration.
  • Out of the charts containing misconfigurations and CVEs, the vast majority were Low and Medium severity, but still, 13% were High or Critical.
  • 52% of CVEs were reported during the current or previous year. That said, some very common CVEs, (included in multiple charts via image re-use) date back as far as 2016.

While this data highlights just how important it is to get visibility into both misconfigurations and vulnerabilities, the challenge is knowing how to answer “What actually matters?”

On paper, they all matter. You should have no CVEs or misconfigurations in anything—ever. But in reality, the answer is “it depends.”

Take this Load Balancer as an example:

Is it misconfigured? Yes!

Is it posing a risk to any of your infrastructure? No, not really.

Because there are no backend services, applications, or systems connected to the Load Balancer, the lax encryption protocol is not actually exposing anything to the world. So apart from the unnecessary cost of an extra Load Balancer running in your cloud environment, the impact or “blast radius” for this misconfiguration is very minimal.

Compare this to our example misconfiguration-based exploit path from earlier, however, which starts with a similar security group issue. If the Load Balancer were to expose an internal monitoring endpoint containing an unpatched CVE allowing shell access, you could say the blast radius is extremely large.

In our pursuit of minimizing noise and highlighting real-world risk, we wanted to come up with a way to visualize an attack’s blast radius.

Visualizing Blast Radius with Helm Scanner 2.0

When we consider the interactions between discovered CVEs and relevant misconfigurations, we can cut through the noise of tens or hundreds of reported issues and find the ones that could be used to string together a successful compromise. But in order to do that, we have to look at the composition of your whole deployment—from infrastructure through to code vulnerabilities.

Enter Helm Scanner 2.0.

To start to build up a prioritized picture, we needed the data in a better format than CSVs. We needed to see connections between resources and how they were linked. We needed a graph!

Taking inspiration from the graph-based policies in Checkov 2.0, we rewrote Helm Scanner to build a graph of each Helm chart instead of outputting the data to a CSV for further analysis. This allowed us to build queries and better visualize the interactions between misconfigurations and CVEs.

Now we can, at a glance, visualize potential attack paths through CVEs to misconfigurations or vice versa, enabling us to prioritize security fixes that will provide the biggest reduction in blast radius!

Let’s walk through an example

In the graph below, we can see with a quick glance that the Helm chart in question has a potentially large blast radius:

In contrast, a smaller, less cluttered graph represents a more secure Helm chart.

As we zoom in, we can see a number of Kubernetes misconfigurations highlighted by the nodes labeled “CKV_K8S_”:

As you can see, the chart has failed a number of Checkov Kubernetes policies, for example
CKV_K8S_40: Use high UIDs for users within the container to prevent host overlap and CKV_K8S_31: Ensure that the seccomp profile is set to docker/default or runtime/default.

We can also see they are all originating from a single deployment Kubernetes object within the chart called Deployment.default. Misconfigured pod, service, or DaemonSet objects would also show up here.

We then see, linked to the deployment object, any container images used within the deployment, in this case, a10networks/acos-prometheus-exporter and a node for each CVE discovered within that image when scanned:

Having this visual representation between resources, Checkov policies, images, and CVEs allows us to quickly assess potential attack chains and focus our efforts on the remediation of high severity issues.

For example, CVE-2021-33574 is a 9.8 critical CVE within the version of Glibc found in the container. Coupled with CKV_K8S_40—which suggests the container may be running with UID’s overlapping with the underlying host—we have a much better idea as to the risk associated with our whole deployment compared with considering CVEs or misconfigurations in isolation.

Open Research

In continuing the research done with Unit 42, we have created graph visualizations for each scanned Helm chart and have published them in this public repository. You can apply this research to understand your own organization’s risk by cross-referencing your Helm dependencies with those scanned in our dataset.

If there’s anything we’ve learned since we published our first research on the State of Open Source Terraform Security, it’s that this space is continuously evolving. That’s why we will continue running the scanner against new Artifact Hub Helm charts and making them available publicly for analysis. Be sure to bookmark the page and join us on Slack with recommendations on how we can improve this data.

To learn more about how this research impacts real-world applications and prevent that risk from happening, check out these resources: