Infrastructure as code (IaC) tools allow us to collaborate over configuration changes that are about to happen to our cloud environment before they are applied. One of the basic principles of IaC versus a scripting language is the ability to persist in state a description of what is deployed and map it back to the configuration manifests. In other words, you can store locally or in a managed provider a map of what your cloud resource configurations should be based on the templates you have applied.
This state helps to enhance deployment performance by caching the existing cloud configurations so that when a change is about to be applied, it can be compared with existing cloud configurations. The state can be refreshed with the current configurations of the production environment, but that only occurs manually.
Drift occurs when this state is out of date. When changes are made directly to a resource in a cloud provider, the stored state no longer represents what is running in the cloud. Detecting and fixing that drift is important to maintain the benefits of IaC tools.
There are multiple tools available for drift detection. Let’s compare three different tools with three different techniques for detecting and remediating drift. We’ll use the following criteria to compare them:
- Level of access required. What kind of permissions and to which data source does a user need to detect a drift and fix it?
- Multi-cloud. Does the drift detection work with one or multiple cloud providers?
- Multi IaC. What are the supported IaC frameworks for drift detection?
- Fix complexity. There are 2 ways to fix a drift, either fix the code to match the cloud or vice versa. Assuming a GitOps philosophy, we ideally want to fix the code. How easy would it be to get into the specific code block that should be changed to fix the drift?
Drift detection in CloudFormation
CloudFormation defines AWS resources and dependencies and groups them into “stacks.” When you manage a stack state, you can modify the provisioned resources without a complete rebuild. After you create a stack from a template, you can detect disparate changes (i.e., drift) from the Console, CLI, or from your own code. You can detect drift on an entire stack or on a particular resource.
Many organizations are still working to implement IaC. Some previously manually provisioned infrastructure is categorized as “legacy” or “tech debt” while the organization is adjusting its processes to work using a GitOps workflow. When you’re in a mixed environment, it’s difficult to tell what was provisioned with CloudFormation versus provisioned manually using the AWS console. Since there are dependencies between resources or you got used to old habits of managing manual and In scripts and not automatically those things happen.
During this transition period, an SRE might be required to do manual modifications that do not have a corresponding configuration in code. This could be direct changes to the AWS resources (and their properties) without updating the CloudFormation template. An example of this would be setting up IAM permissions to grant access to a service or user that did not have access before.
Unmanaged configuration changes are not peer-reviewed and can break integrations between different application parts. For example, if you try to replicate a production environment in a testing environment, that is far more challenging if the production environment has had manual changes. The cloud resources no longer match the CloudFormation template that provisioned them. This leads to confusion and, in some cases, requires deleting a stack and reprovisioning it which might impact the uptime of the application.
To detect drift using AWS’ own drift detection tool, you’ll need to have your user assigned with the role
AWSCloudFormationReadOnlyAccess. This level of access is not too permissive assuming the templates are not too sensitive and secrets are not stored in the template. Go to the CloudFormation service in the AWS Console and choose to detect drift.
And view drift results, like an IAM policy that had a property removal to prevent access:
While this change might be legitimately a fix (perhaps that user didn’t need the PassRole permission), the CloudFormation template is now out of sync. We could create a changeset to directly update the CloudFormation stack, but that change won’t persist in the CloudFormation template stored in git.
If you have multiple repositories and multiple AWS accounts, it might be hard to track the file that is associated with the provisioned template and align it with the changes made directly in AWS Console.
- Well defined level of access that has low security risk.
- Not multi-cloud, not multi-IaC. This is a dedicated solution for AWS.
- Finding the place to fix in code is manual and can take a long time in environments with a high number of CloudFormation templates and a high number of git repositories.
Drift detection in Terraform
Terraform stores information about your infrastructure locally in a file named terraform.tfstate (by default). This file is responsible for mapping a resource defined in configuration to its real-world resource.
This mapping can help to detect drift by running a command such as
terraform refresh or
The plan output is very detailed and will tell you what change is going to be applied or overridden on the next
The output is very easy to understand, and since its being usually executed in a continuous delivery (CD) context it’s possible to track the repository a resource belongs to after some digging.
Another advantage of Terraform is that it will tell you if a fresh new resource will be provisioned and the current one will be destroyed (e.g., it forces a new resource) when there is drift.
However, this system is not without its faults. Imagine purging a DB with all of its records just because of a configuration change and not having a backup. This might be painful.
Additionally, this requires an admin-like level of access to run those commands because the state file and plan files might contain sensitive data, such as your DB connection details.
- A consistent way to determine drift across cloud providers.
- Medium fix complexity since this can be caught right before deploying code but does not always lead to the specific file or lines of code to be modified in case of modules.
- Only one IaC supported. Development environments might contain more than one way to provision resources.
- Requires admin-like permission to have access to the code, cloud, and state file (that has secrets and sensitive data). For that reason, it makes it hard to democratize and grant access to all engineers in the organization to view drifts from anywhere.
Drift detection in Bridgecrew
In Bridgecrew we tried to overcome the limitations of current drift detection offerings by supporting drift detection for all three of the major clouds. We do this without access to a state that contains sensitive data like passwords and with traceability up to the code that is connected to that resource.
We’ve done that by using Yor tracing capability that injects tags into resources defined in Terraform, CloudFormation, and Serverless Framework.
Once Yor is implemented in the continuous integration (CI) process, Bridgecrew will know how to connect code to cloud and compare the diffs between the blueprint in the code to its live instance in the cloud.
Bridgecrew also offers “Fix Drift,” which creates a pull request to align your code with the cloud state—keeping drift afar and maintaining the joy of collaboration without requiring access to sensitive data stored in state.
- A consistent way to determine drift across cloud providers and IaC tools.
- Simple to fix a drift since it guides you to the exact lines of code in a file.
- Level of access required is “read only” for code and cloud configuration.
- There is a prerequisite to run Yor as part of your CI and to persist tags in code to have the solution working.