Introduction
Kubernetes has become the standard for container orchestration. Running Kubernetes in production requires careful planning around security, monitoring, and operations. This guide covers production-ready Kubernetes deployment and management.
Production Cluster Setup
Managed vs Self-Managed
| Option | Pros | Cons |
|---|---|---|
| EKS/GKE/AKS | Managed control plane | Cost, lock-in |
| Self-hosted | Full control | Operational burden |
| Kubespray | Automation | Complexity |
Cluster Sizing
# Example node pool configuration (GKE)
nodePools:
- name: default
machineType: e2-standard-4
diskSizeGb: 100
autoscaling:
minNodeCount: 2
maxNodeCount: 10
Multi-AZ Deployment
# Spread across availability zones
topologyKey: topology.kubernetes.io/zone
Application Deployment
Deployment Strategies
Rolling Update:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
Blue-Green:
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
version: v2
Resource Management
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Health Checks
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Security
RBAC Configuration
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Secrets Management
apiVersion: v1
kind: Secret
metadata:
name: db-creds
type: Opaque
stringData:
username: admin
password: changeme
Use external secrets operators for production.
Monitoring
Prometheus Setup
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
replicas: 2
serviceAccountName: prometheus
Key Metrics
| Metric | Description |
|---|---|
| Pod CPU usage | Resource utilization |
| Pod memory | Memory pressure |
| API latency | Control plane health |
| Pod restarts | Stability indicator |
Alerting Rules
groups:
- name: kubernetes
rules:
- alert: HighPodMemory
expr: sum(container_memory_working_set_bytes) > 1Gi
for: 5m
labels:
severity: warning
annotations:
summary: High memory usage
Scaling
Horizontal Pod Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Cluster Autoscaling
# Cloud provider integration
# Automatically adjusts node count based on demand
Networking
Ingress Configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- example.com
secretName: tls-secret
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp
port:
number: 80
Service Mesh
For advanced traffic management:
# Istio VirtualService
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- route:
- destination:
host: myapp
subset: v1
weight: 90
- destination:
host: myapp
subset: v2
weight: 10
Backup and Disaster Recovery
Etcd Backup
# Backup
etcdctl snapshot save backup.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Restore
etcdctl snapshot restore backup.db \
--data-dir=/var/lib/etcd
Application Backups
# Velero backup
apiVersion: velero.io/v1
kind: Backup
metadata:
name: daily-backup
spec:
includedNamespaces:
- production
storageLocation: default
CI/CD Integration
GitOps with ArgoCD
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
spec:
project: default
source:
repoURL: https://github.com/myorg/myapp
targetRevision: HEAD
path: k8s
destination:
server: https://kubernetes.default.svc
namespace: production
Troubleshooting
Common Issues
- Pod stuck in Pending: Check node resources, PVC binding
- Pod CrashLoopBackOff: Check logs, events
- Service unreachable: Verify endpoints, network policies
- OOMKilled: Adjust resource limits
Debug Commands
# Pod logs
kubectl logs myapp-pod -f
# Pod events
kubectl describe pod myapp
# Exec into container
kubectl exec -it myapp-pod -- /bin/sh
# View resource usage
kubectl top pods
kubectl top nodes
# Network debugging
kubectl run netshoot --image=nicolaka/netshoot --rm -it
Conclusion
Production Kubernetes requires attention to security, monitoring, and operations. Start with managed services, implement proper RBAC and networking policies, and establish robust monitoring before going production.
Comments