Skip to main content
โšก Calmops

Platform Engineering: Building Internal Developer Platforms

Introduction

Platform engineering is the practice of building and maintaining internal platforms that enable developers to deliver software faster and more reliably.

This guide covers platform engineering: building internal developer platforms, Golden Paths, self-service infrastructure, and measuring developer experience.


What is Platform Engineering?

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 PLATFORM ENGINEERING VISION                               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                      โ”‚
โ”‚   Before Platform Engineering:                                        โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚  Dev โ”€โ”€โ–ถ Ops โ”€โ”€โ–ถ Security โ”€โ”€โ–ถ Platform โ”€โ”€โ–ถ Prod           โ”‚   โ”‚
โ”‚   โ”‚  Wait   Wait    Wait       Wait         Wait              โ”‚   โ”‚
โ”‚   โ”‚  Manual handoffs, tickets, and dependencies              โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                      โ”‚
โ”‚   After Platform Engineering:                                          โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚  Dev โ”€โ”€โ–ถ Internal Platform โ”€โ”€โ–ถ Prod                        โ”‚   โ”‚
โ”‚   โ”‚  Self-service, automated, opinionated                    โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                      โ”‚
โ”‚   Platform provides:                                                 โ”‚
โ”‚   โ€ข Pre-configured services                                         โ”‚
โ”‚   โ€ข Self-service provisioning                                        โ”‚
โ”‚   โ€ข Built-in security                                               โ”‚
โ”‚   โ€ข Observability                                                   โ”‚
โ”‚   โ€ข Guardrails                                                      โ”‚
โ”‚                                                                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Platform Components

Golden Paths

# Example: Standard application deployment path
golden_path:
  name: "Standard Web Application"
  description: "Recommended path for web apps"
  
  components:
    - name: "Container Registry"
      provider: "AWS ECR"
      access: "Self-service"
    
    - name: "Kubernetes Cluster"
      provider: "AWS EKS"
      config: "Pre-configured with security"
    
    - name: "Database"
      provider: "AWS RDS"
      options: ["PostgreSQL", "MySQL"]
    
    - name: "Caching"
      provider: "AWS ElastiCache"
    
    - name: "CDN"
      provider: "AWS CloudFront"
  
  deployment:
    github_actions: true
    approval_required: false
    
  guardrails:
    - "Required labels"
    - "Resource limits"
    - "Security scanning"

Self-Service Portal

# Backstage catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: my-service
  description: My microservice
  tags:
    - go
    - kubernetes
    - postgres
spec:
  type: service
  lifecycle: production
  owner: platform-team
  system: orders
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: orders-api
spec:
  type: openapi
  lifecycle: production
  owner: platform-team
  definition:
    $openapi: https://example.com/openapi.yaml

Implementation

Backstage Setup

# docker-compose.yaml for Backstage
version: '3.8'
services:
  backstage:
    image: backstage
    ports:
      - "3000:3000"
    environment:
      - APP_CONFIG=backstage.app-config.yaml
    volumes:
      - ./app-config.yaml:/app/backstage.app-config.yaml
      - ./plugins:/app/plugins

  postgres:
    image: postgres:14
    environment:
      - POSTGRES_PASSWORD=secret
    volumes:
      - postgres:/var/lib/postgresql/data
# backstage.app-config.yaml
app:
  baseUrl: http://localhost:3000

proxy:
  '/catalog/api':
    target: http://localhost:7007
    changeOrigin: true

catalog:
  import:
    entityFilename: catalog-info.yaml
  locations:
    - type: url
      target: https://github.com/org/repos.yaml

integrations:
  github:
    - host: github.com
      token: ${GITHUB_TOKEN}

Terraform Modules

# module: aws/eks
# Pre-configured EKS cluster for platform
module "eks" {
  source = "terraform-aws-modules/eks/aws"
  
  cluster_name    = "platform-eks"
  cluster_version = "1.28"
  
  vpc_id         = module.vpc.vpc_id
  subnet_ids     = module.vpc.private_subnets
  
  eks_managed_node_groups = {
    application = {
      min_size       = 3
      max_size       = 10
      instance_types = ["m5.large"]
    }
  }
  
  # Pre-configured addons
  cluster_addons = {
    vpc-cni             = {}
    coredns             = {}
    kube-proxy          = {}
    aws-ebs-csi-driver  = {}
  }
}

Developer Experience

Measuring DevEx

# Developer Experience Metrics
metrics:
  # Time to First Commit
  - name: "Time to First Commit (TTFC)"
    description: "Time from repo creation to first commit"
    target: "< 30 minutes"
  
  # Deployment Frequency
  - name: "Deployment Frequency"
    description: "How often deployments occur"
    target: "> 10/day"
  
  # Lead Time
  - name: "Lead Time for Changes"
    description: "Time from commit to production"
    target: "< 1 hour"
  
  # Change Failure Rate
  - name: "Change Failure Rate"
    description: "% of deployments causing failures"
    target: "< 5%"
  
  # Mean Time to Recovery
  - name: "MTTR"
    description: "Time to recover from failures"
    target: "< 30 minutes"
  
  # Developer Satisfaction
  - name: "Developer Satisfaction (DSAT)"
    description: "Platform satisfaction score"
    target: "> 4.5/5"

Platform Team Responsibilities

Responsibility Description
Architecture Design platform components
Automation CI/CD pipelines, provisioning
Support Help developers use platform
Documentation Guidebooks, examples
Governance Security, compliance
Innovation New tools, patterns

Golden Path Examples

Deployment Golden Path

# Standard deployment workflow
golden_path:
  deployment:
    name: "Container Deployment"
    
    steps:
      - name: "Build"
        tool: "GitHub Actions / Tekton"
        timeout: "10 minutes"
      
      - name: "Test"
        tool: "Automated tests"
        required: true
      
      - name: "Scan"
        tool: "Security scanner"
        required: true
      
      - name: "Deploy to Staging"
        auto: true
      
      - name: "Deploy to Production"
        auto: true  # With approval for main branch

Security Golden Path

# Pre-configured security
security:
  defaults:
    # Network policies
    network_policy: "deny-all"
    
    # Resource limits
    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"
    
    # Secrets
    secrets: "External secrets operator"
    
    # Scanning
    image_scanning: "trivy"
    dependency_scanning: "Snyk"
    
    # RBAC
    rbac: "Least privilege"

Service Catalog

Component Definition

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: order-service
  annotations:
    github.com/project-slug: org/order-service
    jira/project-key: ORDER
spec:
  type: service
  lifecycle: production
  owner: orders-team
  system: orders
  providesApis:
    - order-api
  consumesApis:
    - user-api
    - payment-api
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: order-api
spec:
  type: openapi
  lifecycle: production
  owner: orders-team
  definition:
    $openapi: https://api.example.com/openapi.yaml

Self-Service Examples

Database Provisioning

# Terraform module for self-service DB
module "database" {
  source = "./modules/database"
  
  # Developer fills these in
  name         = "orders-db"
  engine        = "postgres"
  engine_version = "15.4"
  instance_class = "db.t3.micro"
  allocated_storage = 20
  
  # Platform handles these
  backup_retention = 30
  encryption = true
  multi_az = true
  backup_window = "03:00-04:00"
  
  # Automatic tagging
  tags = {
    Environment = "production"
    Team = "orders-team"
    Platform = "managed"
  }
}

Kubernetes Namespace

# Self-service namespace request
apiVersion: platform.example.com/v1
kind: NamespaceRequest
metadata:
  name: orders
spec:
  team: orders-team
  description: "Orders microservice namespace"
  quota:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
  networkPolicies:
    defaultDeny: true
    allowIngress:
      - fromNamespace: ingress-nginx

Best Practices

Good: Opinionated Platform

# Good: Clear, opinionated choices
platform:
  language_runtimes:
    - Go 1.21+
    - Node.js 20+
    - Python 3.11+
  
  databases:
    primary: PostgreSQL 15
    cache: Redis 7
  
  containers:
    base_image: "distroless"
    registry: "ECR"
  
  ci_cd:
    tool: "GitHub Actions"
    deployment: "ArgoCD"

Bad: Too Many Choices

# Bad: Overwhelming options
platform:
  languages:
    - Go, Rust, Java, Python, Node.js, Ruby, PHP, C#, Scala, Kotlin
  databases:
    - PostgreSQL, MySQL, MongoDB, Cassandra, DynamoDB, Redis, Elastic, Neo4j
  containers:
    - Docker, Podman, containerd
  ci_cd:
    - Jenkins, GitHub Actions, GitLab CI, CircleCI, Tekton, ArgoCD, Flux

Conclusion

Platform engineering enables developer productivity:

  • Self-service: Developers provision resources
  • Golden Paths: Opinionated, safe defaults
  • Automation: Reduced toil
  • Governance: Built-in security
  • Measurement: Track developer experience

Comments