Introduction
Kubernetes has become the de facto standard for container orchestration in production environments. However, deploying Kubernetes at scale requires deep understanding of cluster architecture, resource management, networking, and operational patterns. Many organizations struggle with Kubernetes complexity, leading to inefficient resource utilization, high costs, and operational challenges.
This comprehensive guide covers production-grade Kubernetes deployment patterns, scaling strategies, and best practices for enterprise systems handling millions of requests daily.
Core Concepts & Terminology
Cluster
A set of nodes (machines) running containerized applications managed by Kubernetes control plane.
Node
A physical or virtual machine running the Kubernetes runtime (kubelet) and container runtime (Docker, containerd).
Pod
The smallest deployable unit in Kubernetes. Usually contains one container, but can contain multiple tightly coupled containers.
Deployment
A declarative way to manage Pods and ReplicaSets. Handles rolling updates, rollbacks, and scaling.
Service
An abstraction that exposes Pods as a network service. Provides stable IP and DNS name for accessing Pods.
Ingress
Manages external HTTP/HTTPS access to services. Provides routing, SSL termination, and load balancing.
StatefulSet
Manages stateful applications with persistent identity and storage. Used for databases, message queues.
DaemonSet
Ensures a Pod runs on every node. Used for logging, monitoring, and system utilities.
ConfigMap
Stores non-sensitive configuration data as key-value pairs. Mounted as files or environment variables.
Secret
Stores sensitive data (passwords, API keys, certificates). Base64 encoded, can be encrypted at rest.
PersistentVolume (PV)
Storage resource provisioned by administrator. Independent of Pod lifecycle.
PersistentVolumeClaim (PVC)
Request for storage by a Pod. Binds to a PersistentVolume.
Namespace
Virtual cluster within a physical cluster. Provides isolation and resource quotas.
RBAC (Role-Based Access Control)
Authorization mechanism controlling who can perform what actions on resources.
Kubernetes Architecture Overview
Control Plane Components
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Control Plane โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ API Server โ โ Scheduler โ โ Controller โ โ
โ โ โ โ โ โ Manager โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ etcd โ โ Cloud โ โ
โ โ (Database) โ โ Controller โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโผโโโ โโโโโโโโผโโโ โโโโโโโผโโโโโ
โ Node 1 โ โ Node 2 โ โ Node N โ
โ โโโโโโโโ โ โ โโโโโโโโ โ โ โโโโโโโโ โ
โ โ Pod โ โ โ โ Pod โ โ โ โ Pod โ โ
โ โโโโโโโโ โ โ โโโโโโโโ โ โ โโโโโโโโ โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
Node Components
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Node โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ kubelet โ โ
โ โ (Node Agent) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Container Runtime โ โ
โ โ (Docker/containerd) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ kube-proxy โ โ
โ โ (Network Proxy) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Production Cluster Architecture
Multi-Zone High Availability
# Cluster spanning 3 availability zones
apiVersion: v1
kind: Cluster
metadata:
name: production-cluster
spec:
zones:
- us-east-1a
- us-east-1b
- us-east-1c
controlPlane:
replicas: 3 # One per zone
machineType: n2-standard-4
nodePools:
- name: general
zones: [us-east-1a, us-east-1b, us-east-1c]
machineType: n2-standard-8
minNodes: 3
maxNodes: 100
- name: compute-intensive
zones: [us-east-1a, us-east-1b, us-east-1c]
machineType: n2-highmem-16
minNodes: 1
maxNodes: 50
- name: gpu
zones: [us-east-1a, us-east-1b]
machineType: n1-standard-4
accelerators:
- type: nvidia-tesla-v100
count: 1
minNodes: 0
maxNodes: 20
Deployment Patterns
1. Rolling Update Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # 2 extra pods during update
maxUnavailable: 1 # 1 pod can be unavailable
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
version: v2.0.0
spec:
containers:
- name: api
image: myregistry.azurecr.io/api-server:v2.0.0
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
2. Blue-Green Deployment
# Blue environment (current production)
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server-blue
spec:
replicas: 10
selector:
matchLabels:
app: api-server
version: blue
template:
metadata:
labels:
app: api-server
version: blue
spec:
containers:
- name: api
image: myregistry.azurecr.io/api-server:v1.0.0
---
# Green environment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server-green
spec:
replicas: 10
selector:
matchLabels:
app: api-server
version: green
template:
metadata:
labels:
app: api-server
version: green
spec:
containers:
- name: api
image: myregistry.azurecr.io/api-server:v2.0.0
---
# Service routes to blue (can switch to green)
apiVersion: v1
kind: Service
metadata:
name: api-server
spec:
selector:
app: api-server
version: blue # Switch to 'green' for cutover
ports:
- port: 80
targetPort: 8080
3. Canary Deployment
# Canary deployment with traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api-server
spec:
hosts:
- api-server
http:
- match:
- uri:
prefix: /
route:
- destination:
host: api-server
subset: v1
weight: 95 # 95% traffic to v1
- destination:
host: api-server
subset: v2
weight: 5 # 5% traffic to v2 (canary)
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: api-server
spec:
host: api-server
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 100
http2MaxRequests: 1000
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
Auto-Scaling Strategies
1. Horizontal Pod Autoscaler (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 10
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 10
periodSeconds: 30
selectPolicy: Max
2. Vertical Pod Autoscaler (VPA)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Auto" # Can be "Off", "Initial", "Recreate", "Auto"
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 2Gi
controlledResources: ["cpu", "memory"]
3. Cluster Autoscaler
# Cluster Autoscaler configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-status
namespace: kube-system
data:
nodes.max: "1000"
nodes.min: "10"
scale-down-enabled: "true"
scale-down-delay-after-add: "10m"
scale-down-delay-after-failure: "3m"
scale-down-delay-after-delete: "10s"
scale-down-unneeded-time: "10m"
skip-nodes-with-local-storage: "false"
skip-nodes-with-system-pods: "false"
Resource Management
1. Resource Requests and Limits
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
spec:
containers:
- name: api
image: api-server:v1.0.0
resources:
# Minimum guaranteed resources
requests:
cpu: 500m # 0.5 CPU cores
memory: 512Mi # 512 MB
ephemeral-storage: 1Gi
# Maximum allowed resources
limits:
cpu: 2000m # 2 CPU cores
memory: 2Gi # 2 GB
ephemeral-storage: 5Gi
2. Resource Quotas
apiVersion: v1
kind: Namespace
metadata:
name: production
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "100"
requests.memory: "200Gi"
limits.cpu: "200"
limits.memory: "400Gi"
pods: "1000"
services: "100"
persistentvolumeclaims: "50"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high", "medium"]
3. Pod Disruption Budgets
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
spec:
minAvailable: 5 # At least 5 pods must be available
selector:
matchLabels:
app: api-server
unhealthyPodEvictionPolicy: AlwaysAllow
Networking & Service Mesh
1. Service Configuration
apiVersion: v1
kind: Service
metadata:
name: api-server
namespace: production
spec:
type: ClusterIP
selector:
app: api-server
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
- name: metrics
port: 9090
targetPort: 9090
protocol: TCP
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
2. Ingress Configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls-cert
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-server
port:
number: 80
3. Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-server-netpol
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: production
- podSelector:
matchLabels:
app: nginx-ingress
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443
- protocol: TCP
port: 5432 # Database
Storage Management
1. StatefulSet with Persistent Storage
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:14
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
2. Storage Classes
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
encrypted: "true"
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
Monitoring & Observability
1. Prometheus Monitoring
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: api-server-monitor
spec:
selector:
matchLabels:
app: api-server
endpoints:
- port: metrics
interval: 30s
path: /metrics
2. Logging with ELK Stack
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon Off
Log_Level info
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
[OUTPUT]
Name es
Match kube.*
Host elasticsearch
Port 9200
Logstash_Format On
Best Practices & Common Pitfalls
Best Practices
- Resource Requests/Limits: Always set appropriate requests and limits
- Health Checks: Implement liveness and readiness probes
- Pod Disruption Budgets: Protect critical workloads during maintenance
- Network Policies: Restrict traffic between pods
- RBAC: Use least privilege access control
- Monitoring: Implement comprehensive monitoring and alerting
- Backup Strategy: Regular backups of etcd and persistent data
- Multi-Zone: Deploy across availability zones for HA
- Resource Quotas: Prevent resource exhaustion
- GitOps: Use declarative configuration management
Common Pitfalls
- No Resource Limits: Pods consuming unlimited resources
- Insufficient Replicas: Single replica deployments
- Poor Health Checks: Incorrect probe configuration
- Tight Coupling: Pods with hard dependencies
- No Monitoring: Flying blind in production
- Inadequate Storage: Running out of disk space
- Network Misconfiguration: Connectivity issues
- Security Gaps: Exposed APIs, weak RBAC
- Inefficient Scaling: Slow or incorrect auto-scaling
- No Disaster Recovery: No backup or recovery plan
Real-World Scaling Example
Scenario: E-commerce Platform
# Namespace for production
apiVersion: v1
kind: Namespace
metadata:
name: ecommerce-prod
---
# API Server Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: ecommerce-prod
spec:
replicas: 20
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 5
maxUnavailable: 2
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api-server
topologyKey: kubernetes.io/hostname
containers:
- name: api
image: myregistry.azurecr.io/api-server:v2.0.0
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
---
# HPA for API Server
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
namespace: ecommerce-prod
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 20
maxReplicas: 200
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
External Resources
Documentation & Guides
Tools & Platforms
Learning Resources
Monitoring & Observability
Conclusion
Kubernetes at scale requires careful planning, proper architecture, and operational discipline. By implementing multi-zone deployments, auto-scaling strategies, resource management, and comprehensive monitoring, organizations can build reliable, efficient production systems.
The key to success is starting with solid fundamentals: proper resource requests/limits, health checks, and monitoring. Then gradually add advanced features like service meshes, advanced scheduling, and sophisticated auto-scaling.
Start with a well-architected cluster, implement best practices from day one, and continuously optimize based on real-world metrics and feedback.
Comments