Introduction
Platform engineering is the practice of building and maintaining internal platforms that enable developers to deliver software faster and more reliably.
This guide covers platform engineering: building internal developer platforms, Golden Paths, self-service infrastructure, and measuring developer experience.
What is Platform Engineering?
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PLATFORM ENGINEERING VISION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Before Platform Engineering: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Dev โโโถ Ops โโโถ Security โโโถ Platform โโโถ Prod โ โ
โ โ Wait Wait Wait Wait Wait โ โ
โ โ Manual handoffs, tickets, and dependencies โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ After Platform Engineering: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Dev โโโถ Internal Platform โโโถ Prod โ โ
โ โ Self-service, automated, opinionated โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Platform provides: โ
โ โข Pre-configured services โ
โ โข Self-service provisioning โ
โ โข Built-in security โ
โ โข Observability โ
โ โข Guardrails โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Platform Components
Golden Paths
# Example: Standard application deployment path
golden_path:
name: "Standard Web Application"
description: "Recommended path for web apps"
components:
- name: "Container Registry"
provider: "AWS ECR"
access: "Self-service"
- name: "Kubernetes Cluster"
provider: "AWS EKS"
config: "Pre-configured with security"
- name: "Database"
provider: "AWS RDS"
options: ["PostgreSQL", "MySQL"]
- name: "Caching"
provider: "AWS ElastiCache"
- name: "CDN"
provider: "AWS CloudFront"
deployment:
github_actions: true
approval_required: false
guardrails:
- "Required labels"
- "Resource limits"
- "Security scanning"
Self-Service Portal
# Backstage catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: my-service
description: My microservice
tags:
- go
- kubernetes
- postgres
spec:
type: service
lifecycle: production
owner: platform-team
system: orders
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: orders-api
spec:
type: openapi
lifecycle: production
owner: platform-team
definition:
$openapi: https://example.com/openapi.yaml
Implementation
Backstage Setup
# docker-compose.yaml for Backstage
version: '3.8'
services:
backstage:
image: backstage
ports:
- "3000:3000"
environment:
- APP_CONFIG=backstage.app-config.yaml
volumes:
- ./app-config.yaml:/app/backstage.app-config.yaml
- ./plugins:/app/plugins
postgres:
image: postgres:14
environment:
- POSTGRES_PASSWORD=secret
volumes:
- postgres:/var/lib/postgresql/data
# backstage.app-config.yaml
app:
baseUrl: http://localhost:3000
proxy:
'/catalog/api':
target: http://localhost:7007
changeOrigin: true
catalog:
import:
entityFilename: catalog-info.yaml
locations:
- type: url
target: https://github.com/org/repos.yaml
integrations:
github:
- host: github.com
token: ${GITHUB_TOKEN}
Terraform Modules
# module: aws/eks
# Pre-configured EKS cluster for platform
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "platform-eks"
cluster_version = "1.28"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
eks_managed_node_groups = {
application = {
min_size = 3
max_size = 10
instance_types = ["m5.large"]
}
}
# Pre-configured addons
cluster_addons = {
vpc-cni = {}
coredns = {}
kube-proxy = {}
aws-ebs-csi-driver = {}
}
}
Developer Experience
Measuring DevEx
# Developer Experience Metrics
metrics:
# Time to First Commit
- name: "Time to First Commit (TTFC)"
description: "Time from repo creation to first commit"
target: "< 30 minutes"
# Deployment Frequency
- name: "Deployment Frequency"
description: "How often deployments occur"
target: "> 10/day"
# Lead Time
- name: "Lead Time for Changes"
description: "Time from commit to production"
target: "< 1 hour"
# Change Failure Rate
- name: "Change Failure Rate"
description: "% of deployments causing failures"
target: "< 5%"
# Mean Time to Recovery
- name: "MTTR"
description: "Time to recover from failures"
target: "< 30 minutes"
# Developer Satisfaction
- name: "Developer Satisfaction (DSAT)"
description: "Platform satisfaction score"
target: "> 4.5/5"
Platform Team Responsibilities
| Responsibility | Description |
|---|---|
| Architecture | Design platform components |
| Automation | CI/CD pipelines, provisioning |
| Support | Help developers use platform |
| Documentation | Guidebooks, examples |
| Governance | Security, compliance |
| Innovation | New tools, patterns |
Golden Path Examples
Deployment Golden Path
# Standard deployment workflow
golden_path:
deployment:
name: "Container Deployment"
steps:
- name: "Build"
tool: "GitHub Actions / Tekton"
timeout: "10 minutes"
- name: "Test"
tool: "Automated tests"
required: true
- name: "Scan"
tool: "Security scanner"
required: true
- name: "Deploy to Staging"
auto: true
- name: "Deploy to Production"
auto: true # With approval for main branch
Security Golden Path
# Pre-configured security
security:
defaults:
# Network policies
network_policy: "deny-all"
# Resource limits
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
# Secrets
secrets: "External secrets operator"
# Scanning
image_scanning: "trivy"
dependency_scanning: "Snyk"
# RBAC
rbac: "Least privilege"
Service Catalog
Component Definition
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: order-service
annotations:
github.com/project-slug: org/order-service
jira/project-key: ORDER
spec:
type: service
lifecycle: production
owner: orders-team
system: orders
providesApis:
- order-api
consumesApis:
- user-api
- payment-api
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: order-api
spec:
type: openapi
lifecycle: production
owner: orders-team
definition:
$openapi: https://api.example.com/openapi.yaml
Self-Service Examples
Database Provisioning
# Terraform module for self-service DB
module "database" {
source = "./modules/database"
# Developer fills these in
name = "orders-db"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.t3.micro"
allocated_storage = 20
# Platform handles these
backup_retention = 30
encryption = true
multi_az = true
backup_window = "03:00-04:00"
# Automatic tagging
tags = {
Environment = "production"
Team = "orders-team"
Platform = "managed"
}
}
Kubernetes Namespace
# Self-service namespace request
apiVersion: platform.example.com/v1
kind: NamespaceRequest
metadata:
name: orders
spec:
team: orders-team
description: "Orders microservice namespace"
quota:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
networkPolicies:
defaultDeny: true
allowIngress:
- fromNamespace: ingress-nginx
Best Practices
Good: Opinionated Platform
# Good: Clear, opinionated choices
platform:
language_runtimes:
- Go 1.21+
- Node.js 20+
- Python 3.11+
databases:
primary: PostgreSQL 15
cache: Redis 7
containers:
base_image: "distroless"
registry: "ECR"
ci_cd:
tool: "GitHub Actions"
deployment: "ArgoCD"
Bad: Too Many Choices
# Bad: Overwhelming options
platform:
languages:
- Go, Rust, Java, Python, Node.js, Ruby, PHP, C#, Scala, Kotlin
databases:
- PostgreSQL, MySQL, MongoDB, Cassandra, DynamoDB, Redis, Elastic, Neo4j
containers:
- Docker, Podman, containerd
ci_cd:
- Jenkins, GitHub Actions, GitLab CI, CircleCI, Tekton, ArgoCD, Flux
Conclusion
Platform engineering enables developer productivity:
- Self-service: Developers provision resources
- Golden Paths: Opinionated, safe defaults
- Automation: Reduced toil
- Governance: Built-in security
- Measurement: Track developer experience
Comments