Securing cloud infrastructure in run-time vs. build-time

Securing cloud infrastructure

In today’s cloud-native world wherein infrastructure as code adoption is soaring and building cloud environments at scale requires reproducibility and resilience, the ability to change and grow infrastructure quickly is prioritized from the start. It’s intriguing then, that for so many of us, “cloud security” is associated only with addressing misconfigurations and compliance violations after they’ve occurred in run-time.

Identifying infrastructure issues without focusing on the process and code in build-time is at complete odds with how we design and build modern cloud infrastructure. If we build immutable infrastructure, we need to start thinking about how to secure immutable infrastructure, and run-time security in isolation isn’t enough. On the flip side, addressing cloud security risks in build-time alone lacks the full context of production infrastructure, leaving gaps in your environment.

In this post, we’ll focus on security issue detection via scanning in both build-time and run-time, outlining their values and pitfalls to illustrate the importance of leveraging them both.

Run-time cloud security posture management

To keep up with clouds becoming more complex, cloud providers supply rich metadata and telemetry surrounding the management of cloud resources. Building a sustainable cloud security program requires the consistent and extensible collection and analysis of that data.

Community-led projects such as Prowler for AWS and Forseti for Google Cloud have emerged to help serve that purpose. Both projects pioneered the usage of exposed APIs to gather configuration data and inspect for misconfigurations and are implemented to detect post-deployment misconfigurations.

Most cloud providers also now include this type of functionality in their control plane management services. Using native tools like AWS Config, Azure Policies, and Google Asset Inventory, it is easier than ever to gain that basic visibility for your cloud.

Run-time cloud security is certainly best practice but comes with its own set of benefits and caveats:

✅  Change tracking

Scanning in run-time follows the actual states of configurations. When managing configuration in multiple methods, run-time scanning remains the primary viable technique for identifying and evaluating configuration changes over time.

✅  Compliance-friendly

Most regulated industries now require continuous change-control auditing and tracing. To satisfy those requirements, most scanners map their findings to standard industry benchmarks. Once controls are mapped into benchmarks and sections, you can use the scan reports as baseline evidence to satisfy most industry-specific requirements and audits.

✅  Near real-time results

Depending on the scan frequency, run-time scanning can quickly identify and classify ongoing issues. Connecting scanners to ticketing or monitoring tools can help ensure speedier response and mitigation.

❌  Low signal to noise

Most scanners still rely heavily on deterministic detection logic that lacks context, resulting in a tide of irrelevant findings—especially for dynamic environments with short-lived resources. For example, in environments utilizing auto-scaling groups, run-time scanning would return inconsistent results between scans and produce output that’s not representative of the latest resource states. Additionally, scanning multi-faceted IAM permissions or full networking topology could falsely alarm against a configuration change.

❌  Impracticable findings

After flagging a misconfiguration the immediate question is usually “what can we do to fix it?” If fixing a single cloud misconfiguration requires ten manual steps, or a configuration cannot be reverted, then its very escalation ended up wasted valuable developer time.

❌  Recurring misconfigurations

For teams utilizing infrastructure code frameworks to orchestrate cloud resources, fixing a misconfiguration solely in run-time leaves the risk of it recurring. To ensure that a cloud misconfiguration won’t recur, remediations have to happen at the source.

Build-time infrastructure scanning

Scanning configuration in build-time is not new. Identifying coding errors has been around for a while—especially in AppSec. The application of this approach has drastically expanded over the past couple of years, however, with the rise of infrastructure as code to provision cloud resources at scale.

Scanning configurations managed as code utilizes the same high-level policies as run-time scanners and searches for the same resources and their configuration states. By using IaC scanners such as our open-source tool, Checkov, configuration files are treated as standalone manifests describing how resources are going to get provisioned and attributes set.

By applying many of the lessons learned with addressing cloud security in run-time, we can identify additional areas of value and drawbacks with build-time scanning:

✅  Actionable findings

With configurations listed and manged in code, it becomes much easier to pinpoint the exact attributes and arguments causing a misconfiguration.

✅  Collaborative resolution

With detection and response all in code, every developer can help resolve ongoing issues. By unifying detection and remediation in the same tools, it’s easier to build cloud security from the start and into day-to-day workflows.

✅  Automated response

By identifying and fixing issues in machine-readable languages, it is easier to develop automations to find and fix configurations with zero to little human touch. Automation is key to building and maintaining secure cloud infrastructure at scale.

❌  Disassociated findings

Configuration issues detected only in build-time could potentially represent only part of a more complete configuration posture. For example, imagine an organization that manages networking components in run-time and compute resources in build-time. The identification of an exposed internet-facing EC2 could easily be suppressed knowing that a hardened VPC or security group will secure it is not accessible to the world.

❌  Missing context

Relying solely on build-time findings without attributing them to actual configuration states in run-time could result in configuration clashes. For example, attempting to encrypt a previously unencrypted DB instance could fail to provision a change, as most managed DB services do not permit encryption after the fact.

❌  Partial coverage

Although growing, infrastructure as code frameworks lag in support for all publically available cloud services. Limited support to build also translates into limitations when it comes to developing a misconfiguration detection strategy around it.

Best of both worlds

With more cloud services and configuration frameworks than ever, the challenge to secure them demands a unified approach to managing cloud security throughout the operational and the development lifecycles.

That’s why we at Bridgecrew believe scanning in build-time and run-time are not competing strategies, but rather completing ones.

Run-time scanning provides an accurate and near-real-time depiction of the current configuration state, but it’s only with the addition of build-time scanning that teams can respond and fix errors where they occur.

This post originally appeared on DZone.