Kubernetes in Production: A Practical Guide

Introduction

Kubernetes has become the standard for container orchestration. Running Kubernetes in production requires careful planning around security, monitoring, and operations. This guide covers production-ready Kubernetes deployment and management.

Production Cluster Setup

Managed vs Self-Managed

Option	Pros	Cons
EKS/GKE/AKS	Managed control plane	Cost, lock-in
Self-hosted	Full control	Operational burden
Kubespray	Automation	Complexity

Cluster Sizing

# Example node pool configuration (GKE)
nodePools:
  - name: default
    machineType: e2-standard-4
    diskSizeGb: 100
    autoscaling:
      minNodeCount: 2
      maxNodeCount: 10

Multi-AZ Deployment

# Spread across availability zones
topologyKey: topology.kubernetes.io/zone

Application Deployment

Deployment Strategies

Rolling Update:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Blue-Green:

apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    version: v2

Resource Management

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Health Checks

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Security

RBAC Configuration

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Secrets Management

apiVersion: v1
kind: Secret
metadata:
  name: db-creds
type: Opaque
stringData:
  username: admin
  password: changeme

Use external secrets operators for production.

Monitoring

Prometheus Setup

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  replicas: 2
  serviceAccountName: prometheus

Key Metrics

Metric	Description
Pod CPU usage	Resource utilization
Pod memory	Memory pressure
API latency	Control plane health
Pod restarts	Stability indicator

Alerting Rules

groups:
- name: kubernetes
  rules:
  - alert: HighPodMemory
    expr: sum(container_memory_working_set_bytes) > 1Gi
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: High memory usage

Scaling

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Cluster Autoscaling

# Cloud provider integration
# Automatically adjusts node count based on demand

Networking

Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - example.com
    secretName: tls-secret
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80

Service Mesh

For advanced traffic management:

# Istio VirtualService
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - myapp
  http:
  - route:
    - destination:
        host: myapp
        subset: v1
      weight: 90
    - destination:
        host: myapp
        subset: v2
      weight: 10

Backup and Disaster Recovery

Etcd Backup

# Backup
etcdctl snapshot save backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Restore
etcdctl snapshot restore backup.db \
  --data-dir=/var/lib/etcd

Application Backups

# Velero backup
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: daily-backup
spec:
  includedNamespaces:
  - production
  storageLocation: default

CI/CD Integration

GitOps with ArgoCD

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myapp
    targetRevision: HEAD
    path: k8s
  destination:
    server: https://kubernetes.default.svc
    namespace: production

Troubleshooting

Common Issues

Pod stuck in Pending: Check node resources, PVC binding
Pod CrashLoopBackOff: Check logs, events
Service unreachable: Verify endpoints, network policies
OOMKilled: Adjust resource limits

Debug Commands

# Pod logs
kubectl logs myapp-pod -f

# Pod events
kubectl describe pod myapp

# Exec into container
kubectl exec -it myapp-pod -- /bin/sh

# View resource usage
kubectl top pods
kubectl top nodes

# Network debugging
kubectl run netshoot --image=nicolaka/netshoot --rm -it

Conclusion

Production Kubernetes requires attention to security, monitoring, and operations. Start with managed services, implement proper RBAC and networking policies, and establish robust monitoring before going production.