Skip to main content
โšก Calmops

Kubernetes Operators: Automating Complex Workloads

Introduction

Kubernetes was designed to automate the deployment and management of containerized applications. However, many complex workloads require domain-specific knowledge that the Kubernetes API alone cannot provide. This is where Operators come in.

Operators are Kubernetes extensions that encode operational knowledge into software, automating the entire lifecycle of complex stateful applications. They represent a powerful pattern for managing applications that require specialized handling.

This guide covers the Operator pattern, how to build Operators, and practical examples for automating complex workloads.


Understanding the Operator Pattern

The Operator pattern extends Kubernetes with custom controllers that understand application-specific requirements.

Core Concepts

Custom Resource Definition (CRD): Extends the Kubernetes API with custom resource types.

Controller: Continuously monitors resources and reconciles desired state.

Operator: A combination of CRDs and controllers that encode domain knowledge.

How Operators Work

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Operator Pattern                      โ”‚
โ”‚                                                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Custom       โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚   Operator Controller    โ”‚   โ”‚
โ”‚  โ”‚ Resource     โ”‚       โ”‚   (Reconciliation Loop)  โ”‚   โ”‚
โ”‚  โ”‚ (YAML)       โ”‚       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                       โ”‚              โ”‚
โ”‚                                        โ”‚              โ”‚
โ”‚                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚                       โ–ผ                             โ”‚ โ”‚
โ”‚                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                       โ”‚ โ”‚
โ”‚                  โ”‚ Deploy   โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                  โ”‚ Manage   โ”‚                         โ”‚
โ”‚                  โ”‚ Monitor  โ”‚                         โ”‚
โ”‚                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Common Operators

Operator Purpose
Prometheus Operator Monitoring stack
cert-manager TLS certificates
External-DNS DNS management
Vault Operator Secret management
MySQL Operator Database management
Kafka Operator Event streaming

Building an Operator

Using Operator SDK

# Install Operator SDK
brew install operator-sdk

# Initialize operator project
operator-sdk init --domain example.com --project-name my-operator

# Create API (CRD)
operator-sdk create api --group cache --version v1alpha1 --kind Redis

# Generate manifests
make manifests

# Build operator
make docker-build IMG=my-operator:latest

# Deploy to cluster
make deploy

Define Custom Resource

# config/crd/bases/cache.example.com_redis.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: redis.cache.example.com
spec:
  group: cache.example.com
  names:
    kind: Redis
    listKind: RedisList
    plural: redis
    singular: redis
  scope: Namespaced
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 10
                image:
                  type: string
                storage:
                  type: string
            status:
              type: object
              properties:
                readyReplicas:
                  type: integer
                conditions:
                  type: array

Implement Controller Logic

// controllers/redis_controller.go
package controllers

import (
    "context"
    "fmt"
    
    "github.com/go-logr/logr"
    "k8s.io/apimachinery/pkg/runtime"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    
    cachev1alpha1 "my-operator/api/v1alpha1"
)

// RedisReconciler reconciles a Redis object
type RedisReconciler struct {
    client.Client
    Log    logr.Logger
    Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=cache.example.com,resources=redis,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=cache.example.com,resources=redis/status,verbs=get;update;patch
// +kubebuilder:rbac:groups="",resources=pods;services;configmaps;events,verbs="*"

func (r *RedisReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("redis", req.NamespacedName)
    
    // Fetch the Redis instance
    redis := &cachev1alpha1.Redis{}
    err := r.Get(ctx, req.NamespacedName, redis)
    if err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }
    
    // Check if being deleted
    if redis.DeletionTimestamp != nil {
        return ctrl.Result{}, r.cleanup(redis)
    }
    
    // Reconcile the Redis deployment
    result, err := r.reconcileRedis(redis)
    if err != nil {
        log.Error(err, "Failed to reconcile Redis")
        return result, err
    }
    
    return ctrl.Result{}, nil
}

func (r *RedisReconciler) reconcileRedis(redis *cachev1alpha1.Redis) (ctrl.Result, error) {
    // Create or update Deployment
    deployment := r.newDeployment(redis)
    if err := r.Create(context.TODO(), deployment); err != nil {
        return ctrl.Result{}, err
    }
    
    // Create or update Service
    service := r.newService(redis)
    if err := r.Create(context.TODO(), service); err != nil {
        return ctrl.Result{}, err
    }
    
    // Update status
    redis.Status.ReadyReplicas = *redis.Spec.Replicas
    if err := r.Status().Update(context.TODO(), redis); err != nil {
        return ctrl.Result{}, err
    }
    
    return ctrl.Result{}, nil
}

func (r *RedisReconciler) newDeployment(redis *cachev1alpha1.Redis) *appsv1.Deployment {
    labels := map[string]string{"app": redis.Name}
    return &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      redis.Name,
            Namespace: redis.Namespace,
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: redis.Spec.Replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: labels,
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: labels,
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{
                        {
                            Name:  "redis",
                            Image: redis.Spec.Image,
                            Ports: []corev1.ContainerPort{
                                {ContainerPort: 6379},
                            },
                        },
                    },
                },
            },
        },
    }
}

func (r *RedisReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&cachev1alpha1.Redis{}).
        Complete(r)
}

Using Operators

Deploying with Operator

# redis.yaml - Custom Resource
apiVersion: cache.example.com/v1alpha1
kind: Redis
metadata:
  name: my-redis
  namespace: default
spec:
  replicas: 3
  image: redis:7-alpine
  storage: 10Gi

# Apply
kubectl apply -f redis.yaml

# Monitor status
kubectl get redis my-redis
kubectl describe redis my-redis

Operator Lifecycle Management

# Install Operator from OperatorHub
operatorhub install prometheus

# Update Operator
operatorhub update prometheus

# Uninstall Operator
operatorhub uninstall prometheus

Best Practices

Resource Definitions

  • Use meaningful defaults
  • Validate all fields with CEL or webhook
  • Document fields in CRD
  • Version CRDs properly (v1alpha1 โ†’ v1beta1 โ†’ v1)

Controller Development

  • Implement idempotent reconciliation
  • Handle partial state gracefully
  • Add status conditions for clarity
  • Implement proper error handling
  • Add finalizers for cleanup
// Example: Idempotent reconciliation
func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) error {
    existing := &MyResource{}
    err := r.Get(ctx, req.NamespacedName, existing)
    
    if errors.IsNotFound(err) {
        // Resource doesn't exist - create it
        return r.create(req)
    }
    
    if err != nil {
        return err
    }
    
    // Resource exists - update if needed
    return r.update(existing)
}

Testing

// controllers/suite_test.go
import (
    "testing"
    
    "my-operator/api/v1alpha1"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/envtest"
)

func TestRedisReconciler(t *testing.T) {
    env := &envtest.Environment{}
    cfg, _ := env.Start()
    
    k8sClient, _ := client.New(cfg, client.Options{})
    
    // Create test resource
    redis := &v1alpha1.Redis{
        ObjectMeta: metav1.ObjectMeta{Name: "test"},
        Spec: v1alpha1.RedisSpec{
            Replicas: 1,
            Image:    "redis:7",
        },
    }
    k8sClient.Create(context.Background(), redis)
    
    // Test reconciliation logic here
}

Common Operator Patterns

Backup Operator

apiVersion: backup.example.com/v1alpha1
kind: Backup
metadata:
  name: daily-backup
spec:
  schedule: "0 2 * * *"
  retention: 30d
  target:
    kind: Database
    name: production-db
  storage:
    type: s3
    bucket: my-backups

Database Operator

apiVersion: database.example.com/v1alpha1
kind: PostgreSQL
metadata:
  name: my-db
spec:
  version: "15"
  replicas: 2
  storage: 100Gi
  backup:
    enabled: true
    schedule: "0 */6 * * *"

Implementation Checklist

Planning Phase

  • Identify application requirements
  • Define CRD schema
  • Plan reconciliation logic
  • Design status reporting

Development Phase

  • Initialize Operator SDK project
  • Create CRD definitions
  • Implement controller logic
  • Add validation webhooks
  • Write tests

Deployment Phase

  • Build container image
  • Create OLM bundle
  • Publish to OperatorHub
  • Document usage

Summary

Operators extend Kubernetes to manage complex, stateful applications:

  1. CRDs define custom resource types for your application
  2. Controllers implement reconciliation logic
  3. Operators package CRDs and controllers together
  4. Best practices include idempotency, validation, and testing

The Operator pattern enables sophisticated automation that would otherwise require manual intervention.


External Resources

Comments