Introduction
Infrastructure as Code (IaC) revolutionizes how we manage infrastructure by treating it the same way as application code. Instead of manually provisioning servers through web consoles or clicking through wizards, you define your infrastructure in declarative configuration files that can be version-controlled, tested, reviewed, and automated.
This comprehensive guide covers IaC principles, practical implementation with Terraform and Ansible, state management strategies, module design patterns, and building reliable infrastructure pipelines.
Why Infrastructure as Code Matters
Traditional vs IaC Approach
| Aspect | Traditional | IaC |
|---|---|---|
| Provisioning | Manual, click-based | Declarative, automated |
| Reproducibility | Difficult, error-prone | Easy, consistent |
| Versioning | None | Full Git history |
| Review Process | None | Pull request reviews |
| Rollback | Manual, risky | Automatic, versioned |
| Documentation | Often outdated | Self-documenting |
Benefits of IaC
- Consistency - Same configuration every time
- Auditability - Full history of changes
- Speed - Rapid provisioning and teardown
- Collaboration - Code review for infrastructure
- Disaster Recovery - Recreate infrastructure quickly
Terraform Deep Dive
Project Structure
A well-organized Terraform project improves maintainability:
terraform/
โโโ environments/
โ โโโ dev/
โ โ โโโ main.tf
โ โ โโโ variables.tf
โ โ โโโ outputs.tf
โ โ โโโ terraform.tfvars
โ โโโ staging/
โ โโโ prod/
โโโ modules/
โ โโโ vpc/
โ โโโ ec2/
โ โโโ rds/
โ โโโ ecs/
โโโ global/
โ โโโ s3/
โโโ backend.tf
โโโ provider.tf
โโโ versions.tf
Provider Configuration
# versions.tf
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# backend.tf - Remote state with locking
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# provider.tf
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = "MyProject"
}
}
}
Variables and Outputs
# variables.tf
variable "environment" {
description = "Environment name"
type = string
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Must be dev, staging, or prod"
}
}
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "vpc_cidr" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
validation {
condition = can(regex("^t3\\.", var.instance_type))
error_message = "Must be a t3 instance type"
}
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
# outputs.tf
output "vpc_id" {
description = "ID of the VPC"
value = module.vpc.vpc_id
}
output "web_server_ip" {
description = "Public IP of web server"
value = aws_instance.web.public_ip
sensitive = true
}
output "database_connection_string" {
description = "Database connection string"
value = module.database.connection_string
sensitive = true
}
Using Modules
# environments/prod/main.tf
module "vpc" {
source = "../../modules/vpc"
environment = "prod"
cidr_block = "10.1.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
tags = {
Environment = "prod"
}
}
module "ecs_cluster" {
source = "../../modules/ecs"
cluster_name = "prod-app"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
desired_capacity = 3
max_size = 10
min_size = 2
tags = {
Environment = "prod"
}
}
module "rds" {
source = "../../modules/rds"
identifier = "prod-postgres"
engine_version = "15.4"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.database_subnets
instance_class = "db.t3.medium"
allocated_storage = 100
backup_retention_period = 30
skip_final_snapshot = false
tags = {
Environment = "prod"
}
}
VPC Module Example
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(var.tags, {
Name = "${var.environment}-vpc"
})
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = merge(var.tags, {
Name = "${var.environment}-igw"
})
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.cidr_block, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = merge(var.tags, {
Name = "${var.environment}-public-${count.index + 1}"
})
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.cidr_block, 8, count.index + length(var.availability_zones))
availability_zone = var.availability_zones[count.index]
tags = merge(var.tags, {
Name = "${var.environment}-private-${count.index + 1}"
})
}
# modules/vpc/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "public_subnets" {
value = aws_subnet.public[*].id
}
output "private_subnets" {
value = aws_subnet.private[*].id
}
Ansible for Configuration Management
Playbook Structure
# site.yml - Main entry point
---
- import_playbook: base.yml
- import_playbook: webserver.yml
- import_playbook: database.yml
- import_playbook: monitoring.yml
Web Server Playbook
# webserver.yml
---
- name: Configure web servers
hosts: webservers
become: yes
vars:
nginx_version: stable
app_user: webapp
tasks:
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
when: ansible_os_family == "Debian"
- name: Install nginx
apt:
name: nginx
state: present
- name: Install Python for web framework
apt:
name:
- python3
- python3-pip
state: present
- name: Create app user
user:
name: "{{ app_user }}"
system: yes
shell: /bin/bash
create_home: yes
- name: Copy nginx configuration
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
mode: '0644'
notify: Restart nginx
- name: Copy application configuration
template:
src: app_config.yml.j2
dest: /opt/webapp/config.yml
owner: "{{ app_user }}"
group: "{{ app_user }}"
mode: '0600'
notify: Reload application
- name: Ensure nginx is running
service:
name: nginx
state: started
enabled: yes
- name: Configure firewall
ufw:
state: enabled
policy: deny
- name: Allow SSH
ufw:
rule: allow
port: '22'
proto: tcp
- name: Allow HTTP/HTTPS
ufw:
rule: allow
port: '{{ item }}'
proto: tcp
loop:
- 80
- 443
handlers:
- name: Restart nginx
service:
name: nginx
state: restarted
- name: Reload application
systemd:
name: webapp
state: reloaded
Nginx Configuration Template
# templates/nginx.conf.j2
user {{ nginx_user }};
worker_processes {{ nginx_worker_processes }};
error_log {{ nginx_error_log }};
pid {{ nginx_pid_file }};
events {
worker_connections {{ nginx_worker_connections }};
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log {{ nginx_access_log }};
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout {{ nginx_keepalive_timeout }};
types_hash_max_size 2048;
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript application/json application/javascript application/xml+rss;
server {
listen {{ nginx_listen_port }};
server_name {{ nginx_server_name }};
location / {
proxy_pass http://127.0.0.1:{{ app_port }};
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
}
}
Roles for Reusability
roles/
โโโ common/
โ โโโ tasks/
โ โ โโโ main.yml
โ โโโ handlers/
โ โ โโโ main.yml
โ โโโ templates/
โ โโโ ntp.conf.j2
โโโ nginx/
โ โโโ tasks/
โ โ โโโ main.yml
โ โโโ handlers/
โ โ โโโ main.yml
โ โโโ templates/
โ โ โโโ nginx.conf.j2
โ โโโ defaults/
โ โโโ main.yml
โโโ postgres/
โโโ tasks/
โ โโโ main.yml
โโโ handlers/
โ โโโ main.yml
โโโ defaults/
โโโ main.yml
# Using roles
- name: Configure database server
hosts: database
become: yes
roles:
- role: common
- role: postgres
vars:
postgres_version: 15
postgres_max_connections: 200
State Management
Remote State with Locking
# backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "environments/prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-locks"
}
}
State Isolation
# Separate state files for each environment
terraform init -backend-config="key=dev/terraform.tfstate"
terraform init -backend-config="key=prod/terraform.tfstate"
Importing Existing Resources
# Import an existing AWS resource into Terraform
terraform import aws_instance.web i-1234567890abcdef0
Testing Infrastructure
Terraform Validation
# Format check
terraform fmt -check
# Validate syntax
terraform validate
# Check for unused variables
terraform graph | tfgraph
# Plan for review
terraform plan -out=tfplan
terraform show tfplan
Terratest Integration
// infrastructure_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestTerraformWebServer(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../examples/webserver",
Vars: map[string]interface{}{
"environment": "test",
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
instanceID := terraform.Output(t, terraformOptions, "instance_id")
assert.NotEmpty(t, instanceID)
}
Checkov for Security Scanning
# Scan Terraform files
checkov -d ./terraform --framework terraform
# Scan specific file
checkov -f main.tf
# Skip certain checks
checkov -d ./terraform --skip-check CK_AWS_1234
CI/CD Integration
GitHub Actions Workflow
name: Terraform CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Init
run: terraform init
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Terraform Validate
run: terraform validate
- name: Terraform Plan
run: terraform plan -out=tfplan
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Post Plan Comment
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('tfplan', 'utf8');
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '```terraform\n' + plan + '\n```'
})
Best Practices
1. Use Remote State
# Always use remote state for team collaboration
terraform {
backend "s3" {
# ... configuration
}
}
2. Enable State Locking
# Prevents concurrent modifications
dynamodb_table = "terraform-locks"
3. Use Modules for Reusability
# Instead of repeating code
module "vpc" {
source = "./modules/vpc"
# ... configuration
}
4. Never Store Secrets in Code
# Use environment variables or secret management
export TF_VAR_db_password=$(aws secretsmanager get-secret-value --secret-id prod/db-password --query SecretString --output text)
5. Implement Policy as Code
# sentinel/policies/restrict_instance_type.sentinel
import "tfplan/v2" as tfplan
main = rule {
all tfplan.resource_changes as _, rc {
rc.type is "aws_instance" implies
rc.change.after.instance_type in ["t3.micro", "t3.small"]
}
}
Conclusion
Infrastructure as Code transforms how teams provision and manage infrastructure. By treating infrastructure with the same care as application codeโversion control, testing, code review, and automationโyou achieve consistency, auditability, and speed that manual processes cannot match.
Key takeaways:
- Start with Terraform for provisioning, Ansible for configuration
- Use modules and roles to create reusable components
- Always use remote state with locking for team workflows
- Integrate testing and security scanning into CI/CD
- Never store secrets in version control
Resources
- Terraform Documentation - Official Terraform guide
- Ansible Documentation - Official Ansible guide
- Terraform Best Practices - Comprehensive best practices
- Terratest - Go testing framework for Terraform
- Checkov - Static code analysis for IaC
- AWS Well-Architected - Infrastructure as Code - AWS IaC best practices
Comments