Cloud & DevOps

Terraform on AWS: Managing Infrastructure as Code Without the Headaches

Infrastructure as Code with Terraform has become the standard approach for managing cloud resources at any meaningful scale. Unlike CloudFormation, Terraform’s provider model works across AWS, Azure, GCP, and dozens of other platforms. Once you understand the mental model, you’ll never want to click through the AWS console to provision infrastructure again.

The Terraform Mental Model

Terraform operates on a plan-then-apply lifecycle. You write declarative HCL (HashiCorp Configuration Language) code describing the desired state of your infrastructure. When you run terraform plan, Terraform compares your code against the current state (tracked in a state file) and produces a diff showing exactly what will be created, modified, or destroyed. Only when you run terraform apply does Terraform actually make API calls to provision or change resources.

This separation of plan and apply is what makes Terraform safe for production use. You can review every change before it happens, integrate plan output into pull request reviews, and require approvals for changes to production environments. The deterministic nature — the same code always produces the same infrastructure — is what separates IaC from the chaos of manually configured environments where nobody is quite sure what’s actually deployed or why.

Structuring Your First AWS VPC

A well-structured VPC is the foundation of secure AWS architecture. The standard pattern is a VPC with public subnets (for load balancers and NAT gateways), private subnets (for application tier), and isolated subnets (for databases and other sensitive resources). Each subnet tier spans multiple availability zones for redundancy. Internet access for private subnets routes through NAT Gateways in the public subnets, so outbound traffic is possible without exposing private resources directly.

Terraform makes this repeatable. Define your CIDR blocks as variables, use count or for_each to create subnets across AZs dynamically, and use aws_route_table resources to wire up routing. A well-organized Terraform module for VPC creation can be instantiated repeatedly for dev, staging, and production environments with just a few variable changes.

# vpc/main.tf — Core VPC and subnets
variable "vpc_cidr"         { default = "10.0.0.0/16" }
variable "public_subnets"   { default = ["10.0.1.0/24", "10.0.2.0/24"] }
variable "private_subnets"  { default = ["10.0.11.0/24", "10.0.12.0/24"] }
variable "azs"              { default = ["us-east-1a", "us-east-1b"] }
variable "env"              { default = "prod" }

resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = { Name = "${var.env}-vpc" }
}

resource "aws_subnet" "public" {
  count                   = length(var.public_subnets)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnets[count.index]
  availability_zone       = var.azs[count.index]
  map_public_ip_on_launch = true
  tags = { Name = "${var.env}-public-${count.index + 1}" }
}

resource "aws_subnet" "private" {
  count             = length(var.private_subnets)
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnets[count.index]
  availability_zone = var.azs[count.index]
  tags = { Name = "${var.env}-private-${count.index + 1}" }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "${var.env}-igw" }
}

State Files: The Most Important File You Never Edit

Terraform’s state file (terraform.tfstate) is the source of truth that maps your HCL code to real cloud resources. Never edit it manually, never delete it, and never store it in a local filesystem for anything beyond personal experiments. For team environments and production use, store state in a remote backend: S3 with DynamoDB locking for AWS, Azure Blob Storage for Azure, or Terraform Cloud. Remote backends enable collaboration, state locking (preventing concurrent runs from corrupting state), and encryption at rest.

State locking is particularly important in CI/CD pipelines where multiple engineers or automated runs might trigger Terraform simultaneously. Without locking, two concurrent terraform apply runs can corrupt the state file, leaving your infrastructure in an unknown state. DynamoDB-backed state locking on AWS is straightforward to configure and adds minimal overhead. Make configuring remote state the very first thing you do in any new Terraform project — retrofitting it later is painful.

Modules for Reusable Infrastructure

Terraform modules are the building blocks of reusable infrastructure. A module is simply a directory of .tf files with defined inputs (variables) and outputs. The VPC example above is a perfect module candidate — you define it once and call it with different parameters for each environment. The Terraform Registry hosts thousands of community and official modules for common patterns like VPCs, EKS clusters, RDS instances, and more, saving substantial boilerplate.

Module versioning is critical for production stability. Pin your module sources to specific version tags rather than using “latest” or a branch. When you’re ready to upgrade a module, test it in a non-production environment first. Structure your Terraform codebase into separate state roots (workspaces or separate directories) for each environment, with a shared module library. This prevents a staging change from accidentally affecting production state and gives you clear promotion paths through environments.

Leave a Reply