Why "we have IaC" isn't enough

Almost all teams know they should be using Infrastructure as Code (IaC), but few do it well. Oftentimes, IaC ends up as a half-baked repository of copy-pasted modules, forgotten variables, and unreviewed changes that drift out of sync with reality. If you’re going to spend engineering hours on “infrastructure,” you want that work to be safe, repeatable, and auditable—not another source of firefighting. Just having IaC isn’t enough.

Oftentimes, the problems we see most with practitioners who have just rolled out IaC are a result of cutting corners or skipping out on enforcing best practices globally. For example, we’ve seen teams where each engineer declares resources differently, which led to a patchwork of divergent modules that created excessive sprawl and drifted away from prod. Here’s where we’ve seen the vast majority of organizations we’ve worked with miss:

  1. Sprawl & Configuration Drift. We’ve seen teams of dozens of engineers where a lack of clear module boundaries led to engineers declaring resources differently, or make temporary “one-time” tweaks that are implemented in the console and not in the code, accumulating significant undocumented cruft. 

  2. Insufficient Validation & Testing. Staging is always worth doing! Missed semantic errors and broken dependencies have revealed themselves in live systems at the most inopportune time. Without automated linters or policy-as-code (e.g., Sentinel, OPA), it’s easy to open a public S3 bucket or grant over-privileged IAM roles, which are mistakes that audits will flag only months later.

  3. Hidden Security and Compliance Gaps. Bypassing your own IaC defeats the entire purpose of IaC - you lose the ability to trace who made what change, when, and why—essential for forensics and compliance.

  4. Fragile State Management. Too often, we see teams that rely on a single remote state file without proper locking (or using insecure backends) that will subject you to race conditions, manual file edits, and state drift that can take hours to reconcile, or who have huge single-state deployments that cause small updates to be delayed by a dozen resources also affected by a single terraform apply.

At a well-known company, we witnessed a complete failure of IaC. The company had decided to skip out on unit and integration tests, pushing changes that took them hours for on-call engineers to diagnose and fix. We worked with them to create automatable unit and integration tests, as well as modernizing collaboration practices and defining their architecture into well-defined components. Alongside other best practices, the company was soon quickly off their feet. 

We’ve worked with teams of all sizes as well. A single-engineer shop bounced between manual console fixes, and used Forge’s Adaptive Engagement to implement best practices, restore their infrastructure to a clean state, and set up fully automated builds, cutting their incident response from hours to minutes. A large company was consistently failing compliance scans due to undocumented changes. We layered in policy-as-code, gave them clear audit trails, and eliminated console tinkering altogether.

IaC only delivers on its promise when it’s treated as code: reviewed, tested, and governed. Combine that with GitOps, policy-as-code, and Forge’s agentic runbooks, and you’ll turn brittle infrastructure into a self-healing, auditable platform that scales on demand.

Ready to see how rigorous IaC practices can transform your operations? Get in touch today.

Get in touch with our team -
we'll respond in just a few hours.

Get in touch with our team - we'll respond in just a few hours.

Get in touch with our team -
we'll respond in just a few hours.