Skip to main content

DNS and Certificate Automation: Managing Domain and TLS at Scale

Created: January 1, 0001 Larry Qu 7 min read

DNS and TLS certificates are foundational infrastructure. Manual management doesn’t scale and creates security gaps. This guide covers DNS and certificate automation using modern tools and infrastructure-as-code patterns.

DNS Architecture

DNS Record Types

Type Purpose Example
A IPv4 address example.com → 1.2.3.4
AAAA IPv6 address example.com → 2001:db8::1
CNAME Canonical name www → @
MX Mail exchange @ → mail.example.com
TXT Text records @ → “v=spf1 include:_spf.example.com ~all”
NS Name servers @ → ns1.example.com
SOA Start of Authority Administrative info
CAA Certificate Authority example.com → letsencrypt.org

Route53 DNS Configuration

# Terraform - Route53 hosted zone and records
resource "aws_route53_zone" "main" {
  name = "example.com"
  
  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

resource "aws_route53_record" "api" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  alias {
    name                   = "alb-example-123456789.us-east-1.elb.amazonaws.com"
    zone_id                = "Z35SXDOWRQ4VI"
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "www" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "www.example.com"
  type    = "CNAME"
  ttl     = 300
  records = ["example.com"]
}

resource "aws_route53_record" "mx" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "example.com"
  type    = "MX"
  ttl     = 3600
  records = [
    "10 mail1.example.com",
    "20 mail2.example.com"
  ]
}

resource "aws_route53_record" "spf" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "example.com"
  type    = "TXT"
  ttl     = 3600
  records = ["v=spf1 include:_spf.example.com ~all"]
}

Cloudflare DNS

# Cloudflare Terraform provider
provider "cloudflare" {
  api_token = var.cloudflare_api_token
}

resource "cloudflare_record" "api" {
  zone_id = cloudflare_zone.example.id
  name    = "api"
  value   = "1.2.3.4"
  type    = "A"
  proxied = true  # Cloudflare proxy
}

resource "cloudflare_record" "cdn" {
  zone_id = cloudflare_zone.example.id
  name    = "cdn"
  value   = "cdn.example.com"
  type    = "CNAME"
  proxied = true
}

Certificate Management with cert-manager

Installation

# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# Verify installation
kubectl get pods -n cert-manager

Let’s Encrypt Issuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    
    # Use Route53 for DNS challenge
    solvers:
      - dns01:
          route53:
            region: us-east-1
            hostedZoneID: Z1234567890ABC
            
        # Or use Cloudflare
        # dns01:
        #   cloudflare:
        #     apiTokenSecretRef:
        #       name: cloudflare-api-token
        #       key: api-token

Certificate Resource

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: production
spec:
  secretName: example-com-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
    group: cert-manager.io
    
  dnsNames:
    - example.com
    - www.example.com
    - api.example.com
    
  duration: 2160h  # 90 days
  renewBefore: 360h  # 15 days before expiry
  
  # Store in multiple secrets for different uses
  secretTemplates:
    - annotations:
        cert-manager.io/allow-cluster-issue: "true"

Ingress with TLS

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: production-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - example.com
        - www.example.com
      secretName: example-com-tls
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web-service
                port:
                  number: 80

DNS-01 Challenge

The DNS-01 challenge proves you control the domain by creating a TXT record.

# Manual DNS-01 challenge with Cloudflare
import cloudflare

def create_dns_challenge(domain, token):
    """Create DNS TXT record for Let's Encrypt challenge"""
    client = cloudflare.Cloudflare(api_token=token)
    
    # Get zone ID
    zone_id = client.zones.get(params={'name': domain})[0]['id']
    
    # Create TXT record
    client.zones.dns_records.post(
        zone_id,
        data={
            'type': 'TXT',
            'name': f'_acme-challenge.{domain}',
            'content': token,
            'ttl': 60
        }
    )
    
    # Wait for propagation
    import time
    time.sleep(30)
    
    return True

def cleanup_dns_challenge(domain, token, record_id):
    """Clean up DNS TXT record"""
    client = cloudflare.Cloudflare(api_token=token)
    zone_id = client.zones.get(params={'name': domain})[0]['id']
    
    client.zones.dns_records.delete(zone_id, record_id)

Multiple DNS Providers

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: multi-dns-issuer
spec:
  acme:
    solvers:
      # Try Route53 first
      - dns01:
          route53:
            region: us-east-1
            hostedZoneID: Z1234567890ABC
        selector:
          dnsZones:
            - "example.com"
            
      # Fallback to Cloudflare
      - dns01:
          cloudflare:
            apiTokenSecretRef:
              name: cloudflare-api-token
              key: api-token
        selector:
          dnsZones:
            - "*.example.com"

ACM (AWS Certificate Manager)

Request Certificate

# Terraform - ACM certificate
resource "aws_acm_certificate" "main" {
  domain_name       = "example.com"
  subject_alternative_names = ["*.example.com"]
  
  validation_method = "DNS"
  
  lifecycle {
    create_before_destroy = true
  }
}

# Auto-validate with Route53
resource "aws_route53_record" "cert_validation" {
  for_each = {
    for val in aws_acm_certificate.main.domain_validation_options : 
      val.domain_name => val
  }
  
  zone_id = aws_route53_zone.main.zone_id
  name    = each.value.resource_record_name
  type    = each.value.resource_record_type
  ttl     = 60
  records = [each.value.resource_record_value]
}

ALB with HTTPS

# Terraform - ALB with HTTPS
resource "aws_lb" "main" {
  name               = "main-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets           = aws_subnet.public[*].id
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port             = "443"
  protocol         = "HTTPS"
  ssl_policy       = "ELBSecurityPolicy-2016-08"
  certificate_arn  = aws_acm_certificate.main.arn
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.main.arn
  }
}

Automated Renewal Scripts

Renew Certificates Script

#!/usr/bin/env python3
"""Certificate renewal monitoring and automation."""

import boto3
import datetime
import logging
from dataclasses import dataclass

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@dataclass
class Certificate:
    arn: str
    domain: str
    expires: datetime.datetime

def get_expiring_certificates(days=30):
    """Find certificates expiring within specified days"""
    client = boto3.client('acm')
    
    certs = client.list_certificates()['CertificateSummaryList']
    expiring = []
    
    for cert in certs:
        detail = client.describe_certificate(CertificateArn=cert['CertificateArn'])
        
        if detail['Certificate']['Status'] != 'ISSUED':
            continue
            
        not_after = detail['Certificate']['NotAfter']
        days_until_expiry = (not_after - datetime.datetime.now(not_after.tzinfo)).days
        
        if days_until_expiry <= days:
            expiring.append(Certificate(
                arn=cert['CertificateArn'],
                domain=cert['DomainName'],
                expires=not_after
            ))
    
    return expiring

def send_alert(certs):
    """Send alert about expiring certificates"""
    if not certs:
        return
        
    message = "Expiring certificates:\n"
    for cert in certs:
        message += f"- {cert.domain} expires {cert.expires}\n"
    
    # Send via SNS
    sns = boto3.client('sns')
    sns.publish(
        TopicArn='arn:aws:sns:us-east-1:123456789:alerts',
        Subject='Certificate Expiry Alert',
        Message=message
    )

def main():
    expiring = get_expiring_certificates(days=30)
    
    if expiring:
        logger.warning(f"Found {len(expiring)} expiring certificates")
        send_alert(expiring)
    else:
        logger.info("No certificates expiring soon")

if __name__ == "__main__":
    main()

cert-manager Renewal Monitor

# Prometheus alerts for cert-manager
- name: certificate-expiry
  rules:
    - alert: CertManagerCertificateExpiry
      expr: |
        certmanager_certificate_expiration_timestamp - time() < 604800
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "Certificate expiring in less than 7 days"
        
    - alert: CertManagerCertificateExpired
      expr: |
        certmanager_certificate_expiration_timestamp - time() < 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Certificate has expired"

DNSSEC

Enable DNSSEC on Route53

# Terraform - DNSSEC configuration
resource "aws_route53_zone" "main" {
  name = "example.com"
}

resource "aws_route53_key_signing_key" "main" {
  name                = "example-com-key"
  zone_id             = aws_route53_zone.main.zone_id
  key_management_service_arn = aws_kms_key.dnssec.arn
}

resource "aws_route53_dnssec" "main" {
  hosted_zone_id = aws_route53_zone.main.zone_id
}

Cloudflare DNSSEC

# Enable DNSSEC via Cloudflare API
import cloudflare

def enable_dnssec(zone_id, zone_name):
    client = cloudflare.Cloudflare()
    
    # Get DNSSEC public key from Cloudflare
    dnssec = client.zones.dnssec.post(zone_id, data={
        'type': 'DS'
    })
    
    # Create DS record in parent zone
    # (This must be done at your registrar)
    print(f"DS Record: {dnssec['ds']}")

Traffic Routing Patterns

Weighted Routing

# Terraform - Weighted routing
resource "aws_route53_record" "api-v1" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  set_identifier  = "v1"
  health_check_id = aws_route53_health_check.v1.id
  
  alias {
    name                   = "v1-alb.example.com"
    zone_id                = "ZONE_ID"
    evaluate_target_health = true
  }
  
  ttl = 60
  records = ["1.2.3.4"]
  weight = 90
}

resource "aws_route53_record" "api-v2" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  set_identifier  = "v2"
  health_check_id = aws_route53_health_check.v2.id
  
  alias {
    name                   = "v2-alb.example.com"
    zone_id                = "ZONE_ID"
    evaluate_target_health = true
  }
  
  ttl = 60
  records = ["5.6.7.8"]
  weight = 10
}

Latency-Based Routing

resource "aws_route53_record" "us-east" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  set_identifier  = "us-east"
  region         = "us-east-1"
  
  alias {
    name                   = "alb-us-east.example.com"
    zone_id                = "ZONE_ID"
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "eu-west" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  set_identifier  = "eu-west"
  region          = "eu-west-1"
  
  alias {
    name                   = "alb-eu-west.example.com"
    zone_id                = "ZONE_ID"
    evaluate_target_health = true
  }
}

Best Practices

DNS Best Practices

  • Use multiple nameservers for redundancy
  • Enable DNSSEC for security
  • Set appropriate TTLs (shorter for dynamic records)
  • Use ALIAS records instead of CNAMEs at apex
  • Monitor DNS resolution latency
  • Implement rate limiting protection

Certificate Best Practices

  • Use short certificate validity (90 days for Let’s Encrypt)
  • Automate renewal (at least 14 days before expiry)
  • Use DNS-01 challenge for wildcard certificates
  • Monitor certificate expiration proactively
  • Store certificates in secrets, not configmaps
  • Use dedicated certificates per service

Security

# CAA record - restrict certificate authorities
resource "aws_route53_record" "caa" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "example.com"
  type    = "CAA"
  ttl     = 3600
  records = [
    "0 issue \"letsencrypt.org\"",
    "0 issuewild \";\"",
    "0 iodef \"mailto:[email protected]\""
  ]
}
# DNSSEC signing
resource "cloudflare_record" "dmarc" {
  zone_id = cloudflare_zone.example.id
  name    = "_dmarc"
  value   = "v=DMARC1; p=quarantine; rua=mailto:[email protected]"
  type    = "TXT"
}

Monitoring

# DNS monitoring
- name: dns
  rules:
    - alert: HighDNSLatency
      expr: histogram_quantile(0.95, rate(dns_query_duration_seconds_bucket[5m])) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High DNS latency"
        
    - alert: DNSErrors
      expr: rate(dns_query_errors_total[5m]) > 10
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "High DNS error rate"

Conclusion

Automated DNS and certificate management is essential:

  • Use cert-manager for Kubernetes certificate automation
  • Use Route53 or Cloudflare for DNS management as code
  • Enable DNS-01 challenges for wildcard certificates
  • Implement DNSSEC for domain security
  • Monitor certificate expiration proactively
  • Use traffic routing features for blue-green and canary

Start with automated certificates, then add DNS automation.

External Resources

Resources

Comments

Share this article

Scan to read on mobile

👍 Was this article helpful?