Infrastructure as code (IaC) allows us to design, build, deploy, and manage cloud infrastructure in code, rather than clicking through a cloud provider’s UI or having a long list of CLI commands. IaC—especially declarative IaC—simplifies the process of configuring a resource by defining the end state of what you want to be created and the frameworks such as Terraform, CloudFormation, and Azure Resource Manager (ARM), run all of the scripts to provision those resources properly. This method of building out cloud infrastructure is more scalable than running ad hoc commands as all of the configurations are stored centrally and version controlled.
Want to add a new region for disaster recovery? Just run the same templates in a new region. Working with a large team on the same templates? Version control systems (VCS) allow collaboration using code comments on the code itself. Want to roll back a breaking change you made? Undo the changes in the code and rerun the “apply” command or, if you are following GitOps, just run git revert.
IaC also presents a new opportunity for security. Because the actual commands are abstracted by the tools, many platforms can include more secure defaults than the cloud providers’. It’s also an opportunity to catch misconfigurations before they are ever a public-facing problem. For example, you can prevent provisioning a storage bucket without encryption rather than waiting for a runtime alert when that bucket is already exposed.
That all sounds good, but adopting IaC from traditional UI or CLI configurations will not happen overnight. So if you’re new to IaC adoption, here are some tips to keep in mind.
1. Start with a greenfield application
At Bridgecrew, we’ve helped a lot of our customers go through the transition from manual configurations to securely using IaC. The easiest way to get started is to use a clean slate and do it right from the beginning. Begin with a new application under development, one currently being refactored or one that is constantly updated that you can build cloud-native from the ground up. Follow best practices and adhere as closely as possible to security and compliance requirements.
Then, you can take those learnings and begin to refactor your legacy applications to use IaC for any updates to the infrastructure.
2. Learn best practices using your own, real code
Whenever you learn a new skill, such as IaC security, nothing beats learning by doing. IaC static analysis tools like Checkov have over 800 policies for various IaC templates. Learning all of those policies by reading about them won’t make them stick in your brain.
The best way to learn IaC best practices and grow as a platform engineer is to write code, have a tool and teammates that provide feedback on that code, and learn from their findings. Catching misconfigurations helps you understand what the policies are, why they’re risky, and how to avoid them. This “learn by doing” method will make the policies second nature.
3. Use existing code as a starting point
As with any other code, open source code is a way to jumpstart any new development. There is no reason to reinvent the wheel when you want to provision architecture. Instead, you can take advantage of public and private modules, registries, and repositories.
However, this does not excuse you from security hygiene. We recently analyzed 2,600 Terraform Registry modules and thousands of Helm charts in Artifact Hub for security and compliance best practices. We found that 44% of all Terraform modules on the Terraform Registry contained a misconfiguration and 71% of Helm charts on Artifact Hub contained a misconfiguration.
Just like the cloud’s shared responsibility model, modules in these registries and repositories are built specifically for functionality and ease of use—not for compliance. It’s your responsibility to understand security best practices and what industry standards you need to be compliant with.
So, fork that repository from the person who built the infrastructure for their web application in Terraform as a quick start, but leverage security controls to fix security and compliance violations before you apply the templates.
4. Provide actionable feedback at all stages
The most effective time to fix a misconfiguration is any time before a bad actor exploits that issue. However, the most efficient, lowest effort time to catch and fix a misconfiguration is in the requirements and design phases.
Having frequent, consistent feedback in every stage of the software development lifecycle is the best way to find and fix issues.
- Add in a security expert to requirements scoping and design meetings to include security concerns in the requirements for any addition or update.
- Include automated security testing locally in IDEs to provide feedback to engineers at the source and in context.
- Identify misconfigurations during the CI/CD process, and block unacceptable issues identified from being added to a repository or from being deployed.
- Use your VCS as a place to discuss best practice violations directly in code comments on pull requests/merge requests.
Don’t stop there. Continue to search for misconfigurations using tooling and reviews in your cloud environment at runtime. By checking at every stage, you catch things that slipped through previous reviews and things that only appear misconfigured once fully deployed.
Regardless of where you’re surfacing security feedback, always celebrate finding bugs, and don’t shame the engineer who created them. It’s better to share and learn than to shame and deny. This peer-to-peer learning has well-documented benefits and accelerates development.
5. Force yourself to make fixes in code
The process of doing everything in code and checking it in through a VCS and deploying using CI/CD automation is called GitOps. GitOps has clear operations benefits like increased uptime and faster mean time to resolution (MTTR).
Creating and modifying all configurations in code feels uncomfortable at first. You might be tempted to jump into a cloud UI and make a small tweak, like add a new IP address that can SSH into a VM or add encryption to a storage bucket. Don’t do it! Making changes directly in your cloud UI will negate all of the IaC benefits listed in the introduction.
Manually editing configurations creates drift, where what you have in code does not match what you have running in the cloud. With drift, your IaC templates become out of date and not nearly as useful, and it becomes more difficult to collaborate with colleagues about the changes to your cloud infrastructure. As stated in the previous section, drift is not a name and shame opportunity. Dig into the drift events and learn from them. Was the drift that occurred a one-off event, or is it systematic? You need to address the core problem and move back to making all changes in code.
After enough repetitions of fixing runtime configurations in code, it will become second nature. Need to add a new IP address to SSH into a VM? Once you’ve done it enough, you know exactly where to find that resource block in the IaC file, add the new IP address, and run the “apply” command to get it done fast.
Using IaC to manage all cloud configurations has clear speed and operational benefits. If done properly, it can also create a secure cloud environment before anything is provisioned. These tips will get you started building more secure applications and reducing the burden on engineering and security teams.
One last thing to remember—every mistake is an opportunity to learn as a team. Force all configurations to be made in code so proper security checks can happen before deployment, and so teams can openly collaborate on the changes in a VCS. Encourage learning by celebrating both the wins and the opportunities for growth. If done securely, adopting IaC is an exciting journey that will drastically improve your operations and security. Slow and steady wins the race.
This post was originally published on IT Op Times on August 9th, 2021.