Introduction
In a microservices architecture, a single user request might traverse 10-20 services. Traditional application-level instrumentation requires every team to add observability code to every service. Service meshes solve this by injecting a sidecar proxy (Envoy) alongside each service, capturing all network traffic automatically โ giving you metrics, traces, and security without touching application code.
What you get for free with a service mesh:
- Request rate, error rate, and latency for every service-to-service call
- Distributed traces across all services
- mTLS encryption between all services
- Traffic management (canary deployments, circuit breaking, retries)
How Service Meshes Work
Without service mesh:
Service A โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Service B
(no visibility into what happens here)
With service mesh (Istio):
Service A โ [Envoy sidecar] โโnetworkโโโ [Envoy sidecar] โ Service B
โ โ
Metrics/Traces Metrics/Traces
โ โ
Prometheus/Jaeger Prometheus/Jaeger
The Envoy sidecar proxy intercepts all inbound and outbound traffic. The control plane (Istiod) configures all sidecars centrally.
Istio Architecture
Control Plane (Istiod):
โโโ Pilot โ service discovery, traffic routing config
โโโ Citadel โ certificate management for mTLS
โโโ Galley โ config validation and distribution
Data Plane (Envoy sidecars):
โโโ Service A pod: [app container] + [envoy sidecar]
โโโ Service B pod: [app container] + [envoy sidecar]
โโโ Service C pod: [app container] + [envoy sidecar]
Installing Istio
# Download istioctl
curl -L https://istio.io/downloadIstio | sh -
export PATH=$PWD/istio-1.21.0/bin:$PATH
# Install with default profile
istioctl install --set profile=default -y
# Enable automatic sidecar injection for a namespace
kubectl label namespace default istio-injection=enabled
# Verify installation
istioctl verify-install
kubectl get pods -n istio-system
Automatic Telemetry
Once Istio is installed and sidecar injection is enabled, you get metrics automatically:
# Deploy a sample app
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
# Generate some traffic
kubectl exec -it $(kubectl get pod -l app=ratings -o jsonpath='{.items[0].metadata.name}') \
-- curl -s productpage:9080/productpage | grep -o "<title>.*</title>"
Istio automatically exports these metrics to Prometheus:
# Request rate per service
rate(istio_requests_total[5m])
# Error rate (5xx responses)
rate(istio_requests_total{response_code=~"5.."}[5m])
/ rate(istio_requests_total[5m])
# P99 latency
histogram_quantile(0.99,
rate(istio_request_duration_milliseconds_bucket[5m])
)
# Traffic between specific services
rate(istio_requests_total{
source_app="frontend",
destination_app="backend"
}[5m])
Traffic Management
VirtualService: Routing Rules
# Route 90% to v1, 10% to v2 (canary deployment)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
# Route specific users to v2 (header-based routing)
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- match:
- headers:
x-user-group:
exact: beta-testers
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
DestinationRule: Traffic Policies
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
# Connection pool limits
connectionPool:
http:
http1MaxPendingRequests: 100
http2MaxRequests: 1000
# Load balancing
loadBalancer:
simple: LEAST_CONNECTION
# Circuit breaker
outlierDetection:
consecutive5xxErrors: 5 # eject after 5 consecutive errors
interval: 30s # check every 30s
baseEjectionTime: 30s # eject for at least 30s
maxEjectionPercent: 50 # eject at most 50% of hosts
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
Retry and Timeout
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- route:
- destination:
host: ratings
timeout: 10s # overall request timeout
retries:
attempts: 3 # retry up to 3 times
perTryTimeout: 3s # each attempt times out after 3s
retryOn: 5xx,reset,connect-failure
Distributed Tracing
Istio integrates with Jaeger, Zipkin, and other tracing backends. The only requirement: your services must propagate trace headers from incoming to outgoing requests.
Required Headers to Propagate
x-request-id
x-b3-traceid
x-b3-spanid
x-b3-parentspanid
x-b3-sampled
x-b3-flags
b3
// Node.js: propagate trace headers
app.use((req, res, next) => {
const traceHeaders = [
'x-request-id', 'x-b3-traceid', 'x-b3-spanid',
'x-b3-parentspanid', 'x-b3-sampled', 'x-b3-flags', 'b3'
];
req.traceHeaders = {};
traceHeaders.forEach(header => {
if (req.headers[header]) {
req.traceHeaders[header] = req.headers[header];
}
});
next();
});
// When calling another service, pass the headers
async function callService(url, traceHeaders) {
return fetch(url, { headers: traceHeaders });
}
// Go: propagate trace headers
func callDownstream(ctx context.Context, r *http.Request, url string) (*http.Response, error) {
req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
// Copy trace headers from incoming request
for _, h := range []string{"x-request-id", "x-b3-traceid", "x-b3-spanid", "b3"} {
if v := r.Header.Get(h); v != "" {
req.Header.Set(h, v)
}
}
return http.DefaultClient.Do(req)
}
Configure Tracing Backend
# Telemetry API (Istio 1.12+)
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
name: mesh-default
namespace: istio-system
spec:
tracing:
- providers:
- name: jaeger
randomSamplingPercentage: 10.0 # sample 10% of requests
metrics:
- providers:
- name: prometheus
accessLogging:
- providers:
- name: envoy
mTLS: Automatic Encryption
Istio automatically encrypts all service-to-service communication with mutual TLS:
# Enforce strict mTLS for a namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT # reject non-mTLS connections
# Allow specific services to accept plain text (migration period)
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: legacy-service
namespace: production
spec:
selector:
matchLabels:
app: legacy-service
mtls:
mode: PERMISSIVE # accept both mTLS and plain text
Linkerd: The Lightweight Alternative
Linkerd is simpler than Istio โ easier to install and operate, with lower resource overhead. It uses a Rust-based micro-proxy instead of Envoy.
# Install Linkerd CLI
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
# Pre-flight check
linkerd check --pre
# Install
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
# Verify
linkerd check
# Inject sidecar into a deployment
kubectl get deploy -n myapp -o yaml | linkerd inject - | kubectl apply -f -
Linkerd automatically provides:
- Golden metrics (success rate, RPS, latency) for every service
- mTLS between all meshed services
- Distributed tracing (with Jaeger integration)
Istio vs Linkerd
| Feature | Istio | Linkerd |
|---|---|---|
| Proxy | Envoy (C++) | Linkerd2-proxy (Rust) |
| Resource usage | Higher | Lower |
| Feature set | Comprehensive | Focused |
| Learning curve | Steep | Gentle |
| Traffic management | Advanced | Basic |
| mTLS | Yes | Yes |
| Tracing | Yes | Yes |
| Best for | Complex traffic management | Simplicity, low overhead |
Observability Dashboard
Install Kiali for a visual service mesh dashboard:
kubectl apply -f samples/addons/kiali.yaml
kubectl apply -f samples/addons/prometheus.yaml
kubectl apply -f samples/addons/grafana.yaml
kubectl apply -f samples/addons/jaeger.yaml
# Open Kiali dashboard
istioctl dashboard kiali
Kiali shows:
- Service topology graph with traffic flow
- Error rates and latency per service
- Traffic distribution across versions
- mTLS status
Resources
- Istio Documentation
- Linkerd Documentation
- Envoy Proxy
- Kiali Service Mesh Observability
- Istio in Action (book)
Comments