Skip to main content
โšก Calmops

DNS and Certificate Automation: Managing Domain and TLS at Scale

DNS and TLS certificates are foundational infrastructure. Manual management doesn’t scale and creates security gaps. This guide covers DNS and certificate automation using modern tools and infrastructure-as-code patterns.

DNS Architecture

DNS Record Types

Type Purpose Example
A IPv4 address example.com โ†’ 1.2.3.4
AAAA IPv6 address example.com โ†’ 2001:db8::1
CNAME Canonical name www โ†’ @
MX Mail exchange @ โ†’ mail.example.com
TXT Text records @ โ†’ “v=spf1 include:_spf.example.com ~all”
NS Name servers @ โ†’ ns1.example.com
SOA Start of Authority Administrative info
CAA Certificate Authority example.com โ†’ letsencrypt.org

Route53 DNS Configuration

# Terraform - Route53 hosted zone and records
resource "aws_route53_zone" "main" {
  name = "example.com"
  
  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

resource "aws_route53_record" "api" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  alias {
    name                   = "alb-example-123456789.us-east-1.elb.amazonaws.com"
    zone_id                = "Z35SXDOWRQ4VI"
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "www" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "www.example.com"
  type    = "CNAME"
  ttl     = 300
  records = ["example.com"]
}

resource "aws_route53_record" "mx" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "example.com"
  type    = "MX"
  ttl     = 3600
  records = [
    "10 mail1.example.com",
    "20 mail2.example.com"
  ]
}

resource "aws_route53_record" "spf" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "example.com"
  type    = "TXT"
  ttl     = 3600
  records = ["v=spf1 include:_spf.example.com ~all"]
}

Cloudflare DNS

# Cloudflare Terraform provider
provider "cloudflare" {
  api_token = var.cloudflare_api_token
}

resource "cloudflare_record" "api" {
  zone_id = cloudflare_zone.example.id
  name    = "api"
  value   = "1.2.3.4"
  type    = "A"
  proxied = true  # Cloudflare proxy
}

resource "cloudflare_record" "cdn" {
  zone_id = cloudflare_zone.example.id
  name    = "cdn"
  value   = "cdn.example.com"
  type    = "CNAME"
  proxied = true
}

Certificate Management with cert-manager

Installation

# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# Verify installation
kubectl get pods -n cert-manager

Let’s Encrypt Issuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    
    # Use Route53 for DNS challenge
    solvers:
      - dns01:
          route53:
            region: us-east-1
            hostedZoneID: Z1234567890ABC
            
        # Or use Cloudflare
        # dns01:
        #   cloudflare:
        #     apiTokenSecretRef:
        #       name: cloudflare-api-token
        #       key: api-token

Certificate Resource

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: production
spec:
  secretName: example-com-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
    group: cert-manager.io
    
  dnsNames:
    - example.com
    - www.example.com
    - api.example.com
    
  duration: 2160h  # 90 days
  renewBefore: 360h  # 15 days before expiry
  
  # Store in multiple secrets for different uses
  secretTemplates:
    - annotations:
        cert-manager.io/allow-cluster-issue: "true"

Ingress with TLS

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: production-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - example.com
        - www.example.com
      secretName: example-com-tls
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web-service
                port:
                  number: 80

DNS-01 Challenge

The DNS-01 challenge proves you control the domain by creating a TXT record.

# Manual DNS-01 challenge with Cloudflare
import cloudflare

def create_dns_challenge(domain, token):
    """Create DNS TXT record for Let's Encrypt challenge"""
    client = cloudflare.Cloudflare(api_token=token)
    
    # Get zone ID
    zone_id = client.zones.get(params={'name': domain})[0]['id']
    
    # Create TXT record
    client.zones.dns_records.post(
        zone_id,
        data={
            'type': 'TXT',
            'name': f'_acme-challenge.{domain}',
            'content': token,
            'ttl': 60
        }
    )
    
    # Wait for propagation
    import time
    time.sleep(30)
    
    return True

def cleanup_dns_challenge(domain, token, record_id):
    """Clean up DNS TXT record"""
    client = cloudflare.Cloudflare(api_token=token)
    zone_id = client.zones.get(params={'name': domain})[0]['id']
    
    client.zones.dns_records.delete(zone_id, record_id)

Multiple DNS Providers

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: multi-dns-issuer
spec:
  acme:
    solvers:
      # Try Route53 first
      - dns01:
          route53:
            region: us-east-1
            hostedZoneID: Z1234567890ABC
        selector:
          dnsZones:
            - "example.com"
            
      # Fallback to Cloudflare
      - dns01:
          cloudflare:
            apiTokenSecretRef:
              name: cloudflare-api-token
              key: api-token
        selector:
          dnsZones:
            - "*.example.com"

ACM (AWS Certificate Manager)

Request Certificate

# Terraform - ACM certificate
resource "aws_acm_certificate" "main" {
  domain_name       = "example.com"
  subject_alternative_names = ["*.example.com"]
  
  validation_method = "DNS"
  
  lifecycle {
    create_before_destroy = true
  }
}

# Auto-validate with Route53
resource "aws_route53_record" "cert_validation" {
  for_each = {
    for val in aws_acm_certificate.main.domain_validation_options : 
      val.domain_name => val
  }
  
  zone_id = aws_route53_zone.main.zone_id
  name    = each.value.resource_record_name
  type    = each.value.resource_record_type
  ttl     = 60
  records = [each.value.resource_record_value]
}

ALB with HTTPS

# Terraform - ALB with HTTPS
resource "aws_lb" "main" {
  name               = "main-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets           = aws_subnet.public[*].id
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port             = "443"
  protocol         = "HTTPS"
  ssl_policy       = "ELBSecurityPolicy-2016-08"
  certificate_arn  = aws_acm_certificate.main.arn
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.main.arn
  }
}

Automated Renewal Scripts

Renew Certificates Script

#!/usr/bin/env python3
"""Certificate renewal monitoring and automation."""

import boto3
import datetime
import logging
from dataclasses import dataclass

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@dataclass
class Certificate:
    arn: str
    domain: str
    expires: datetime.datetime

def get_expiring_certificates(days=30):
    """Find certificates expiring within specified days"""
    client = boto3.client('acm')
    
    certs = client.list_certificates()['CertificateSummaryList']
    expiring = []
    
    for cert in certs:
        detail = client.describe_certificate(CertificateArn=cert['CertificateArn'])
        
        if detail['Certificate']['Status'] != 'ISSUED':
            continue
            
        not_after = detail['Certificate']['NotAfter']
        days_until_expiry = (not_after - datetime.datetime.now(not_after.tzinfo)).days
        
        if days_until_expiry <= days:
            expiring.append(Certificate(
                arn=cert['CertificateArn'],
                domain=cert['DomainName'],
                expires=not_after
            ))
    
    return expiring

def send_alert(certs):
    """Send alert about expiring certificates"""
    if not certs:
        return
        
    message = "Expiring certificates:\n"
    for cert in certs:
        message += f"- {cert.domain} expires {cert.expires}\n"
    
    # Send via SNS
    sns = boto3.client('sns')
    sns.publish(
        TopicArn='arn:aws:sns:us-east-1:123456789:alerts',
        Subject='Certificate Expiry Alert',
        Message=message
    )

def main():
    expiring = get_expiring_certificates(days=30)
    
    if expiring:
        logger.warning(f"Found {len(expiring)} expiring certificates")
        send_alert(expiring)
    else:
        logger.info("No certificates expiring soon")

if __name__ == "__main__":
    main()

cert-manager Renewal Monitor

# Prometheus alerts for cert-manager
- name: certificate-expiry
  rules:
    - alert: CertManagerCertificateExpiry
      expr: |
        certmanager_certificate_expiration_timestamp - time() < 604800
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "Certificate expiring in less than 7 days"
        
    - alert: CertManagerCertificateExpired
      expr: |
        certmanager_certificate_expiration_timestamp - time() < 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Certificate has expired"

DNSSEC

Enable DNSSEC on Route53

# Terraform - DNSSEC configuration
resource "aws_route53_zone" "main" {
  name = "example.com"
}

resource "aws_route53_key_signing_key" "main" {
  name                = "example-com-key"
  zone_id             = aws_route53_zone.main.zone_id
  key_management_service_arn = aws_kms_key.dnssec.arn
}

resource "aws_route53_dnssec" "main" {
  hosted_zone_id = aws_route53_zone.main.zone_id
}

Cloudflare DNSSEC

# Enable DNSSEC via Cloudflare API
import cloudflare

def enable_dnssec(zone_id, zone_name):
    client = cloudflare.Cloudflare()
    
    # Get DNSSEC public key from Cloudflare
    dnssec = client.zones.dnssec.post(zone_id, data={
        'type': 'DS'
    })
    
    # Create DS record in parent zone
    # (This must be done at your registrar)
    print(f"DS Record: {dnssec['ds']}")

Traffic Routing Patterns

Weighted Routing

# Terraform - Weighted routing
resource "aws_route53_record" "api-v1" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  set_identifier  = "v1"
  health_check_id = aws_route53_health_check.v1.id
  
  alias {
    name                   = "v1-alb.example.com"
    zone_id                = "ZONE_ID"
    evaluate_target_health = true
  }
  
  ttl = 60
  records = ["1.2.3.4"]
  weight = 90
}

resource "aws_route53_record" "api-v2" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  set_identifier  = "v2"
  health_check_id = aws_route53_health_check.v2.id
  
  alias {
    name                   = "v2-alb.example.com"
    zone_id                = "ZONE_ID"
    evaluate_target_health = true
  }
  
  ttl = 60
  records = ["5.6.7.8"]
  weight = 10
}

Latency-Based Routing

resource "aws_route53_record" "us-east" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  set_identifier  = "us-east"
  region         = "us-east-1"
  
  alias {
    name                   = "alb-us-east.example.com"
    zone_id                = "ZONE_ID"
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "eu-west" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  set_identifier  = "eu-west"
  region          = "eu-west-1"
  
  alias {
    name                   = "alb-eu-west.example.com"
    zone_id                = "ZONE_ID"
    evaluate_target_health = true
  }
}

Best Practices

DNS Best Practices

  • Use multiple nameservers for redundancy
  • Enable DNSSEC for security
  • Set appropriate TTLs (shorter for dynamic records)
  • Use ALIAS records instead of CNAMEs at apex
  • Monitor DNS resolution latency
  • Implement rate limiting protection

Certificate Best Practices

  • Use short certificate validity (90 days for Let’s Encrypt)
  • Automate renewal (at least 14 days before expiry)
  • Use DNS-01 challenge for wildcard certificates
  • Monitor certificate expiration proactively
  • Store certificates in secrets, not configmaps
  • Use dedicated certificates per service

Security

# CAA record - restrict certificate authorities
resource "aws_route53_record" "caa" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "example.com"
  type    = "CAA"
  ttl     = 3600
  records = [
    "0 issue \"letsencrypt.org\"",
    "0 issuewild \";\"",
    "0 iodef \"mailto:[email protected]\""
  ]
}
# DNSSEC signing
resource "cloudflare_record" "dmarc" {
  zone_id = cloudflare_zone.example.id
  name    = "_dmarc"
  value   = "v=DMARC1; p=quarantine; rua=mailto:[email protected]"
  type    = "TXT"
}

Monitoring

# DNS monitoring
- name: dns
  rules:
    - alert: HighDNSLatency
      expr: histogram_quantile(0.95, rate(dns_query_duration_seconds_bucket[5m])) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High DNS latency"
        
    - alert: DNSErrors
      expr: rate(dns_query_errors_total[5m]) > 10
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "High DNS error rate"

Conclusion

Automated DNS and certificate management is essential:

  • Use cert-manager for Kubernetes certificate automation
  • Use Route53 or Cloudflare for DNS management as code
  • Enable DNS-01 challenges for wildcard certificates
  • Implement DNSSEC for domain security
  • Monitor certificate expiration proactively
  • Use traffic routing features for blue-green and canary

Start with automated certificates, then add DNS automation.

External Resources

Comments