Skip to main content

Kubernetes in Production: A Practical Guide

Created: March 9, 2026 CalmOps 3 min read

Introduction

Kubernetes has become the standard for container orchestration. Running Kubernetes in production requires careful planning around security, monitoring, and operations. This guide covers production-ready Kubernetes deployment and management.

Production Cluster Setup

Managed vs Self-Managed

Option Pros Cons
EKS/GKE/AKS Managed control plane Cost, lock-in
Self-hosted Full control Operational burden
Kubespray Automation Complexity

Cluster Sizing

# Example node pool configuration (GKE)
nodePools:
  - name: default
    machineType: e2-standard-4
    diskSizeGb: 100
    autoscaling:
      minNodeCount: 2
      maxNodeCount: 10

Multi-AZ Deployment

# Spread across availability zones
topologyKey: topology.kubernetes.io/zone

Application Deployment

Deployment Strategies

Rolling Update:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Blue-Green:

apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    version: v2

Resource Management

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Health Checks

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Security

RBAC Configuration

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Secrets Management

apiVersion: v1
kind: Secret
metadata:
  name: db-creds
type: Opaque
stringData:
  username: admin
  password: changeme

Use external secrets operators for production.

Monitoring

Prometheus Setup

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  replicas: 2
  serviceAccountName: prometheus

Key Metrics

Metric Description
Pod CPU usage Resource utilization
Pod memory Memory pressure
API latency Control plane health
Pod restarts Stability indicator

Alerting Rules

groups:
- name: kubernetes
  rules:
  - alert: HighPodMemory
    expr: sum(container_memory_working_set_bytes) > 1Gi
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: High memory usage

Scaling

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Cluster Autoscaling

# Cloud provider integration
# Automatically adjusts node count based on demand

Networking

Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - example.com
    secretName: tls-secret
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80

Service Mesh

For advanced traffic management:

# Istio VirtualService
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - myapp
  http:
  - route:
    - destination:
        host: myapp
        subset: v1
      weight: 90
    - destination:
        host: myapp
        subset: v2
      weight: 10

Backup and Disaster Recovery

Etcd Backup

# Backup
etcdctl snapshot save backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Restore
etcdctl snapshot restore backup.db \
  --data-dir=/var/lib/etcd

Application Backups

# Velero backup
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: daily-backup
spec:
  includedNamespaces:
  - production
  storageLocation: default

CI/CD Integration

GitOps with ArgoCD

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myapp
    targetRevision: HEAD
    path: k8s
  destination:
    server: https://kubernetes.default.svc
    namespace: production

Troubleshooting

Common Issues

  1. Pod stuck in Pending: Check node resources, PVC binding
  2. Pod CrashLoopBackOff: Check logs, events
  3. Service unreachable: Verify endpoints, network policies
  4. OOMKilled: Adjust resource limits

Debug Commands

# Pod logs
kubectl logs myapp-pod -f

# Pod events
kubectl describe pod myapp

# Exec into container
kubectl exec -it myapp-pod -- /bin/sh

# View resource usage
kubectl top pods
kubectl top nodes

# Network debugging
kubectl run netshoot --image=nicolaka/netshoot --rm -it

Conclusion

Production Kubernetes requires attention to security, monitoring, and operations. Start with managed services, implement proper RBAC and networking policies, and establish robust monitoring before going production.


Resources

Comments

Share this article

Scan to read on mobile