Skip to main content
โšก Calmops

Platform Engineering Complete Guide: Building Internal Developer Platforms

Introduction

Platform engineering has emerged as a critical discipline for organizations seeking to scale their development teams while maintaining velocity and reliability. By building internal platforms that provide self-service capabilities, organizations can reduce the friction between development and operations teams, accelerate developer productivity, and enforce organizational standards automatically.

This comprehensive guide covers everything you need to know about platform engineering - from understanding the fundamental concepts to building production-ready internal developer platforms. We’ll explore architectural patterns, implementation strategies, and real-world examples that you can adapt to your organization’s needs.

Whether you’re just starting to explore platform engineering or looking to improve your existing platform, this guide provides the knowledge and practical patterns you need to succeed.

Understanding Platform Engineering

The Evolution from DevOps to Platform Engineering

Traditional DevOps aimed to break down silos between development and operations teams. While successful, it often created a new bottleneck: operations teams still had to handle numerous requests for infrastructure, configurations, and access. Platform engineering evolves this model by creating self-service capabilities that empower developers while maintaining governance.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Platform Engineering Evolution                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Traditional โ”‚ --> โ”‚   DevOps    โ”‚ --> โ”‚  Platform Engineering  โ”‚   โ”‚
โ”‚  โ”‚   Siloed   โ”‚     โ”‚   Teams    โ”‚     โ”‚    Self-Service        โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                         โ”‚
โ”‚  โ€ข Manual ops      โ€ข Shared ownership    โ€ข Self-service platform       โ”‚
โ”‚  โ€ข Slow changes    โ€ข Automation          โ€ข Golden paths               โ”‚
โ”‚  โ€ข Knowledge silos โ€ข Collaboration       โ€ข Developer experience        โ”‚
โ”‚                                                                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Core Concepts

Concept Description
Internal Developer Platform (IDP) Internal product that provides self-service capabilities
Self-Service Developers can provision resources without manual intervention
Golden Paths Opinionated, supported, and recommended approaches
Paved Roads Standardized, automated workflows that are safe to follow
Developer Experience (DevX) The overall experience developers have doing their work
Platform Team Team responsible for building and maintaining the platform

Platform Team Responsibilities

The platform team acts as an internal product team:

# Platform team responsibilities
responsibilities = {
    "product_management": "Define and prioritize platform capabilities",
    "platform_development": "Build and maintain platform components",
    "developer_relations": "Support and train platform users",
    "incident_response": "Handle platform-related incidents",
    "governance": "Ensure standards and compliance",
    "documentation": "Maintain up-to-date platform docs"
}

Building Your Internal Developer Platform

Assessment Phase

Before building, understand your current state:

# Developer Experience Assessment
assessment_questions = {
    "time_questions": [
        "How long does it take to provision a new environment?",
        "How long to deploy a simple change to production?",
        "How long to get access to required resources?",
    ],
    "friction_questions": [
        "What manual processes cause delays?",
        "What tools cause the most frustration?",
        "What requests are most common to ops team?",
    ],
    "value_questions": [
        "What would developers self-service if possible?",
        "What would have biggest impact on productivity?",
        "What standards are most important to enforce?",
    ]
}

# Calculate current state metrics
current_state = {
    "avg_time_to_provision": "2 weeks",
    "deployment_frequency": "monthly",
    "self_service_percentage": "20%",
    "developer_satisfaction": "6/10",
    "incident_volume": "50/month",
}

Platform Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Internal Developer Platform                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
โ”‚  โ”‚                    Developer Portal (Backstage)                  โ”‚โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚โ”‚
โ”‚  โ”‚  โ”‚  Service    โ”‚ โ”‚  Resource   โ”‚ โ”‚  Documentation         โ”‚  โ”‚โ”‚
โ”‚  โ”‚  โ”‚  Catalog    โ”‚ โ”‚  Provisioningโ”‚ โ”‚  & Guides              โ”‚  โ”‚โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
โ”‚                                    โ”‚                                 โ”‚
โ”‚                                    โ–ผ                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
โ”‚  โ”‚                      Platform APIs & Services                     โ”‚โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚โ”‚
โ”‚  โ”‚  โ”‚ Provisioningโ”‚ โ”‚  Pipeline   โ”‚ โ”‚  Observability         โ”‚  โ”‚โ”‚
โ”‚  โ”‚  โ”‚   Service   โ”‚ โ”‚   Service   โ”‚ โ”‚    Service             โ”‚  โ”‚โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
โ”‚                                    โ”‚                                 โ”‚
โ”‚                                    โ–ผ                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
โ”‚  โ”‚                    Infrastructure Layer                          โ”‚โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚โ”‚
โ”‚  โ”‚  โ”‚ Kubernetes  โ”‚ โ”‚    Cloud    โ”‚ โ”‚   Databases & Services  โ”‚  โ”‚โ”‚
โ”‚  โ”‚  โ”‚  Cluster    โ”‚ โ”‚  Provider   โ”‚ โ”‚                         โ”‚  โ”‚โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
โ”‚                                                                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Self-Service Infrastructure

Database Provisioning

# Database provisioning service
from dataclasses import dataclass
from typing import Optional, List
import asyncio


@dataclass
class DatabaseSpec:
    """Database specification for self-service provisioning."""
    name: str
    type: str  # postgres, mysql, mongodb, redis
    size: str  # small, medium, large, xlarge
    backup_enabled: bool = True
    backup_retention_days: int = 30
    high_availability: bool = False
    encrypted: bool = True


class DatabaseProvisioner:
    """Self-service database provisioning."""
    
    def __init__(self, cloud_client, secrets_manager):
        self.cloud = cloud_client
        self.secrets = secrets_manager
    
    async def provision(self, spec: DatabaseSpec, owner: str) -> dict:
        """Provision a database based on spec."""
        # Create database instance
        db = await self.cloud.create_database(
            instance_type=self._get_instance_type(spec.size),
            storage=self._get_storage_size(spec.size),
            engine=spec.type,
            high_availability=spec.high_availability,
            encrypted=spec.encrypted
        )
        
        # Configure backup
        if spec.backup_enabled:
            await self.cloud.enable_backup(
                db.id,
                retention_days=spec.backup_retention_days
            )
        
        # Generate credentials
        credentials = self._generate_credentials()
        
        # Store in secrets manager
        secret_name = f"db/{spec.name}"
        await self.secrets.create(
            secret_name,
            {
                "host": db.endpoint,
                "port": db.port,
                "username": credentials.username,
                "password": credentials.password,
                "database": spec.name
            },
            owner=owner
        )
        
        return {
            "id": db.id,
            "endpoint": db.endpoint,
            "status": "ready",
            "secret_name": secret_name,
            "connection_string": f"postgresql://{credentials.username}:{credentials.password}@{db.endpoint}/{spec.name}"
        }
    
    def _get_instance_type(self, size: str) -> str:
        sizes = {
            "small": "db.t3.micro",
            "medium": "db.t3.medium",
            "large": "db.r5.large",
            "xlarge": "db.r5.xlarge"
        }
        return sizes.get(size, "db.t3.micro")
    
    def _get_storage_size(self, size: str) -> int:
        sizes = {"small": 20, "medium": 50, "large": 100, "xlarge": 500}
        return sizes.get(size, 20)
    
    def _generate_credentials(self):
        # Generate secure credentials
        import secrets
        return type('Credentials', (), {
            'username': f"app_{secrets.token_hex(4)}",
            'password': secrets.token_urlsafe(16)
        })()

Environment Management

# Environment provisioning CRD
apiVersion: platform.example.com/v1
kind: Environment
metadata:
  name: myapp-staging
  namespace: platform
spec:
  type: environment
  owner: team-alpha
  cluster: staging-cluster
  namespace: myapp-staging
  services:
    - name: api
      replicas: 2
      resources:
        cpu: "500m"
        memory: "512Mi"
    - name: worker
      replicas: 1
      resources:
        cpu: "1000m"
        memory: "1Gi"
  databases:
    - name: app-db
      type: postgres
      size: medium
  secrets:
    - name: api-keys
      source: vault-secrets
  networking:
    ingress:
      enabled: true
      hostname: myapp.staging.example.com
    egress:
      allowed: all
  monitoring:
    enabled: true
    alertChannel: slack-staging

Golden Paths

Golden paths are opinionated, supported approaches that make it easy to do the right thing:

# Golden path: Standard Service
apiVersion: platform.example.com/v1
kind: GoldenPath
metadata:
  name: standard-service
spec:
  description: Approved path for production services
  
  # Required components
  required:
    - type: github-repository
      ci: github-actions
      default-branch: main
      branch-protection: true
      required-checks:
        - lint
        - test
        - security-scan
    
    - type: kubernetes
      deployment:
        replicas: 2
        strategy: rolling
        health-checks: true
      
    - type: service-mesh
      mtls: required
      observability: enabled
    
    - type: logging
      retention-days: 30
      sampling-rate: 100
    
    - type: monitoring
      metrics: enabled
      alerts:
        - error-rate
        - latency-p99
        - availability
  
  # Optional components
  optional:
    - type: database
      types: [postgres, mysql, redis]
      backup: required
    
    - type: cache
      types: [redis, memcached]
    
    - type: message-queue
      types: [rabbitmq, kafka]
  
  # Standards enforced
  standards:
    - programming-language: [go, python, typescript, java]
    - testing: 80% minimum coverage
    - security: SAST + dependency scan
    - license: approved licenses only

Developer Portal Implementation

Backstage Setup

// Backstage app configuration
import { createApp } from '@backstage/core';
import { catalogPlugin } from '@backstage/plugin-catalog';
import { scaffolderPlugin } from '@backstage/plugin-scaffolder';
import { kubernetesPlugin } from '@backstage/plugin-kubernetes';
import { ArgoCDPlugin } from '@backstage/plugin-argocd';

const app = createApp({
  plugins: [
    catalogPlugin(),
    scaffolderPlugin(),
    kubernetesPlugin(),
    ArgoCDPlugin(),
  ],
  theme: {
    type: 'light',
    colors: {
      primary: '#0066cc',
      secondary: '#0099ff',
    },
  },
});

export default app;

Custom Scaffolder Actions

// Custom action: Provision database
import { createTemplateAction } from '@backstage/plugin-scaffolder';
import { ScmIntegrations } from '@backstage/integration';
import { createDatabaseAction } from './actions/database';
import { createNamespaceAction } from './actions/namespace';
import { setupGitHubActionsAction } from './actions/github-actions';

export const platformActions = [
  createDatabaseAction({
    databaseHost: process.env.DATABASE_HOST,
    databasePort: process.env.DATABASE_PORT,
  }),
  createNamespaceAction({
    kubernetesCluster: process.env.K8S_CLUSTER,
  }),
  setupGitHubActionsAction({
    integrations: ScmIntegrations.fromConfig(config),
  }),
];

// Using the action in a template
# Template: New Service
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: new-service
  title: Create New Service
spec:
  owner: platform-team
  type: service
  
  parameters:
    - title: Service Information
      required:
        - serviceName
        - description
      properties:
        serviceName:
          type: string
          title: Service Name
        description:
          type: string
          title: Description
        team:
          type: string
          title: Owner Team
          
  steps:
    - id: fetch-template
      name: Fetch Template
      action: fetch:template
      input:
        url: https://github.com/example/service-template
        values:
          serviceName: ${{ parameters.serviceName }}
    
    - id: create-namespace
      name: Create Kubernetes Namespace
      action: platform:kubernetes:namespace:create
      input:
        name: ${{ parameters.serviceName }}
        team: ${{ parameters.team }}
    
    - id: provision-database
      name: Provision Database
      action: platform:database:create
      input:
        name: ${{ parameters.serviceName }}
        type: postgres
        size: small
        owner: ${{ parameters.team }}
    
    - id: register-catalog
      name: Register in Catalog
      action: catalog:register
      input:
        entityRef: ${{ steps.fetch-template.output.entityRef }}
    
    - id: publish
      name: Publish
      action: publish:github
      input:
        repoUrl: ${{ steps.fetch-template.output.repoUrl }}

Service Catalog

// Service entity definition
import { Entity } from '@backstage/catalog-model';

const myServiceEntity: Entity = {
  apiVersion: 'backstage.io/v1alpha1',
  kind: 'Component',
  metadata: {
    name: 'payment-service',
    description: 'Payment processing service',
    tags: ['java', 'spring-boot', 'payment'],
    annotations: {
      'github/project-slug': 'company/payment-service',
      'backstage.io/techdocs-ref': 'dir:.',
      'argocd/app-name': 'payment-service',
      'kubernetes/namespace': 'payment',
    },
  },
  spec: {
    type: 'service',
    lifecycle: 'production',
    owner: 'platform-team',
    system: 'payments',
  },
  relations: [
    {
      type: 'ownedBy',
      targetRef: 'group:default/platform-team',
    },
    {
      type: 'dependsOn',
      targetRef: 'resource:default/payment-database',
    },
    {
      type: 'partOf',
      targetRef: 'system:default/payments',
    },
  ],
};

Platform as Code

GitOps for Platform

# Platform configuration in Git
apiVersion: platform.example.com/v1
kind: PlatformConfiguration
metadata:
  name: company-platform
  namespace: platform
spec:
  # Cluster configuration
  clusters:
    - name: production
      region: us-east-1
      nodePools:
        - name: general
          minSize: 3
          maxSize: 10
          instanceType: m5.xlarge
        - name: memory
          minSize: 2
          maxSize: 5
          instanceType: r5.xlarge
    
    - name: staging
      region: us-east-1
      nodePools:
        - name: general
          minSize: 2
          maxSize: 5
          instanceType: m5.large
  
  # Service mesh configuration
  serviceMesh:
    provider: istio
    mtls: strict
    tracing:
      provider: jaeger
      samplingRate: 10
  
  # Observability
  observability:
    prometheus:
      retention: 30d
      storage: 200Gi
    logging:
      retention: 14d
      aggregation: elasticsearch
    alerts:
      channels:
        - type: slack
          webhookUrl: https://hooks.slack.com/...
        - type: pagerduty
          integrationKey: xxx
  
  # Security
  security:
    networkPolicies: true
    podSecurityPolicy: restricted
    secretEncryption:
      provider: vault
    imageScanning: true

Terraform for Platform

# Platform infrastructure as code
module "platform_cluster" {
  source = "./modules/eks"
  
  cluster_name    = "platform-eks"
  cluster_version = "1.28"
  region          = "us-east-1"
  
  node_groups = {
    general = {
      min_size       = 3
      max_size       = 10
      desired_size   = 3
      instance_types = ["m5.xlarge"]
    }
    
    memory = {
      min_size       = 2
      max_size       = 5
      desired_size   = 2
      instance_types = ["r5.xlarge"]
    }
  }
  
  # Enable platform services
  add_ons = {
    vpc-cni            = true
    coredns            = true
    kube-proxy         = true
    aws-load-balancer  = true
    metrics-server     = true
    secrets-store-csi  = true
  }
}

module "platform_database" {
  source = "./modules/rds"
  
  identifier = "platform-db"
  engine     = "postgres"
  engine_version = "15.4"
  instance_class = "db.r6g.large"
  allocated_storage = 100
  
  backup_retention_period = 30
  deletion_protection = true
  
  tags = {
    Platform = "true"
    ManagedBy = "terraform"
  }
}

Metrics and Feedback

Platform Metrics

// Key platform metrics
const platformMetrics = {
  // Adoption metrics
  adoption: {
    dailyActiveUsers: "DAU of platform",
    weeklyActiveUsers: "WAU of platform",
    servicesOnPlatform: "Total services using platform",
    selfServiceRate: "% of resources provisioned self-service",
  },
  
  // Efficiency metrics
  efficiency: {
    timeToProvision: "Average time to provision resources",
    timeToDeploy: "Average time from commit to production",
    deploymentFrequency: "Deployments per day",
    leadTime: "Time from request to delivery",
  },
  
  // Reliability metrics
  reliability: {
    platformUptime: "Platform availability",
    incidentRate: "Incidents per month",
    mttr: "Mean time to recovery",
    supportTickets: "Platform-related tickets",
  },
  
  // Satisfaction metrics
  satisfaction: {
    developerNPS: "Net Promoter Score",
    csat: "Customer Satisfaction Score",
    feedbackCount: "Feedback submissions per quarter",
  }
};

// Example Prometheus queries
const queries = {
  // Adoption
  dailyActiveUsers: `sum(count({app="platform"} |= "action" | json) by (user))`,
  
  // Efficiency
  avgProvisionTime: `avg(platform_provision_duration_seconds)`,
  
  // Reliability
  platformUptime: `1 - (sum(rate(platform_errors_total[5m])) / sum(rate(platform_requests_total[5m])))`,
  
  // Satisfaction
  developerNps: `avg(platform_developer_nps_score)`,
};

Feedback Collection

# Feedback collection system
from dataclasses import dataclass
from typing import List
from datetime import datetime


@dataclass
class Feedback:
    """Platform feedback."""
    user_id: str
    feedback_type: str  # suggestion, bug, compliment, complaint
    category: str
    description: str
    rating: int  # 1-5
    created_at: datetime


class FeedbackCollector:
    """Collect and analyze platform feedback."""
    
    def __init__(self, database):
        self.db = database
    
    async def submit_feedback(self, feedback: Feedback) -> None:
        """Submit user feedback."""
        await self.db.feedback.insert(feedback)
    
    async def get_nps_score(self) -> float:
        """Calculate Net Promoter Score."""
        promoters = await self.db.feedback.count(
            feedback_type='compliment',
            rating__gte=9
        )
        detractors = await self.db.feedback.count(
            feedback_type='complaint',
            rating__lte=6
        )
        total = await self.db.feedback.count()
        
        return ((promoters - detractors) / total * 100) if total > 0 else 0
    
    async def get_top_issues(self, limit: int = 10) -> List[dict]:
        """Get most common issues."""
        return await self.db.feedback.aggregate([
            {'$match': {'feedback_type': 'complaint'}},
            {'$group': {'_id': '$category', 'count': {'$sum': 1}}},
            {'$sort': {'count': -1}},
            {'$limit': limit}
        ])

Best Practices

Practice Implementation
Start with developer needs Survey before building
Build incrementally Start with highest impact services
Treat platform as product UX matters for internal tools
Document everything Self-service requires good docs
Measure everything Track adoption and satisfaction
Iterate based on feedback Regular improvement cycles
Ensure security by default Don’t make it opt-in
Automate compliance Embed in platform, not process

Conclusion

Platform engineering represents a mature approach to developer experience, combining the best of DevOps practices with product thinking. By building internal developer platforms that provide self-service capabilities, organizations can dramatically improve developer productivity while maintaining governance and security.

Key takeaways:

  1. Start with assessment - Understand current pain points before building
  2. Build incrementally - Start with high-impact, low-complexity services
  3. Focus on developer experience - The platform is a product
  4. Create golden paths - Make it easy to do the right thing
  5. Measure everything - Track adoption, efficiency, and satisfaction
  6. Iterate continuously - Platform engineering is never “done”

By following these patterns and principles, you’ll create a platform that developers love to use while enabling your organization to scale effectively.

Resources

Comments