Introduction
Ambient Mesh eliminates per-pod Envoy sidecars by moving mesh functionality to two new components: ztunnel (a node-level Rust-based DaemonSet handling L4 security and mTLS) and waypoint proxies (optional namespace-level Envoy instances for L7 traffic management). The result is 50-90% lower proxy memory overhead, no sidecar injection delays during pod startup, and mesh upgrades that don’t require rolling all application pods.
This guide covers the architecture deep-dive, installation with Istio 1.29, performance benchmarks showing sub-1% latency overhead, real-world cost savings ($2M+/year for large clusters), migrating existing sidecar deployments to ambient mode, and multi-cluster ambient setup.
Architecture: Two-Layer Split Proxy
Unlike sidecar mode where every pod carries a full Envoy proxy handling both L4 and L7, ambient mode splits these responsibilities across two distinct layers:
flowchart LR
subgraph NodeA[Kubernetes Node A]
Pod1[Pod A-1<br/>app only]
Pod2[Pod A-2<br/>app only]
ZTA[ztunnel<br/>L4: mTLS, auth, telemetry]
end
subgraph NodeB[Kubernetes Node B]
Pod3[Pod B-1<br/>app only]
Pod4[Pod B-2<br/>app only]
ZTB[ztunnel<br/>L4: mTLS, auth, telemetry]
end
subgraph NS[Namespace]
WP[waypoint proxy<br/>L7: routing, splitting, rate limit]
end
Pod1 --> ZTA
ZTA <-->|HBONE tunnel| ZTB
ZTB --> Pod3
ZTA <--> WP
WP <--> ZTB
Layer 4 — Secure Overlay (ztunnel) runs as a DaemonSet on every node. It handles:
- Mutual TLS encryption between all pods
- SPIFFE-based workload identity
- L4 authorization policies
- TCP metrics and logging
- Connection-level load balancing
Layer 7 — Waypoint Proxy runs as a Kubernetes deployment (one per namespace or service). It handles:
- HTTP routing (header-based, path-based)
- Traffic splitting and canary deployments
- L7 authorization (JWT validation, OIDC)
- Circuit breaking, retries, fault injection
- L7 observability (RED metrics, distributed tracing)
You can enable L4 alone for zero-trust security and only deploy waypoints for the namespaces that need L7 features — this is ambient’s incremental adoption model.
Sidecar vs Ambient
Sidecar mesh (Istio 1.x-1.28):
Pod = [app container] + [envoy sidecar]
1000 pods = 1000 sidecars × ~60MB = ~60GB proxy RAM
Upgrade = rolling restart of ALL 1000 pods
Ambient mesh (Istio 1.21+):
Pod = [app container] ← no sidecar injected
Node-level: ztunnel DaemonSet (1 per node, ~50MB/node)
Namespace-level: waypoint proxy (optional, for L7 policies)
Upgrade = update ztunnel DaemonSet → no app restarts
Resource Comparison
| Metric | Sidecar (1000 pods) | Ambient (1000 pods, 10 nodes) | Savings |
|---|---|---|---|
| Proxy instances | 1000 (one per pod) | 10 (ztunnel) + ~10 waypoints | ~98% fewer |
| Total proxy RAM | ~50-60 GB | ~1-2 GB | ~96-98% less |
| Pod startup latency | +2-5s (sidecar injection) | 0 (no injection) | Eliminated |
| Mesh upgrade impact | Rolling restart all pods | Rolling restart ztunnel only | Zero app impact |
| Proxy CPU per request | ~0.20 vCPU (Envoy) | ~0.06 vCPU (ztunnel) | ~70% less |
| mTLS CPU overhead | ~24.3% | ~4.8% | ~80% less |
Ztunnel Deep Dive
The ztunnel is a purpose-built Rust proxy designed specifically for L4 mesh traffic. Unlike Envoy — a general-purpose proxy with hundreds of extensions — ztunnel does exactly one thing very well: secure pod-to-pod communication.
HBONE Protocol
All traffic in ambient mode travels via HBONE (HTTP-Based Overlay Network Environment). HBONE encapsulates pod traffic inside HTTP/2 CONNECT tunnels between ztunnel instances:
sequenceDiagram
participant Client as Client Pod
participant ZTA as ztunnel (Node A)
participant ZTB as ztunnel (Node B)
participant Server as Server Pod
Client->>ZTA: Plain TCP (localhost)
ZTA->>ZTB: HBONE tunnel (mTLS + HTTP/2)
ZTB->>Server: Plain TCP (localhost)
Note over ZTA,ZTB: Encrypted, authenticated, multiplexed
The key properties of HBONE:
- Multiplexed: Multiple application connections share a single TLS session between nodes, reducing connection overhead
- Zero-knowledge transport: Ztunnel does not inspect L7 payload — it only sees TCP streams
- TCP_NODELAY enabled by default: Eliminates Nagle’s algorithm delays, reducing latency by up to 40ms for chatty protocols
- Connection pooling: Ztunnel reuses connections aggressively, reducing syscalls
Why Ztunnel Outperforms Kernel Solutions
In Istio’s official iperf benchmarks (March 2025), ztunnel delivered higher encrypted throughput than Cilium with WireGuard, Calico with WireGuard, and plain kernel IPsec:
| Implementation | Same-node throughput | Cross-node throughput |
|---|---|---|
| ztunnel (Istio ambient) | ~35 Gbps | ~28 Gbps |
| Cilium WireGuard | ~25 Gbps | ~20 Gbps |
| Calico WireGuard | ~22 Gbps | ~18 Gbps |
| Kernel IPsec | ~15 Gbps | ~12 Gbps |
The advantage comes from rapid iteration in user space. While kernel networking (eBPF, WireGuard, IPsec) must evolve deliberately across kernel versions, ztunnel ships optimizations quarterly:
- rustls + AWS-LC: Using the
rustlsTLS library backed by AWS-LC’s optimized cryptographic primitives - ChaCha20-Poly1305 + AES-GCM hardware acceleration: Modern ciphers with hardware support
- Zero-copy buffer management: Minimizing data copying between kernel and user space
- 75% throughput improvement over 4 releases: Each quarterly release brings substantial gains
This means Istio ambient mode is now the highest-bandwidth way to achieve encrypted zero-trust networking in Kubernetes — beating even kernel-level eBPF solutions.
Syscall Reduction
CNCF benchmarks by Lin Sun (Solo.io) revealed a surprising finding: ztunnel can reduce total system calls by up to 60% compared to no-mesh operation. The ztunnel coalesces multiple application writes into single network writes via HTTP/2 multiplexing, which means:
- Fortio load tester: ~60% fewer syscalls with ambient vs no mesh
- P90 latency: ambient matches or beats no-mesh in some scenarios
- ~25% CPU reduction on the client pod at peak loads
This explains the counter-intuitive result where ambient sometimes outperforms no-mesh: the connection management and buffering in ztunnel compensate for the added encryption hops.
Waypoint Proxy Architecture
Waypoint proxies provide L7 capabilities on demand. They are standard Envoy deployments that the control plane configures to intercept traffic for a namespace or service account.
Waypoint Lifecycle
flowchart TD
A[Create waypoint] --> B[istioctl waypoint apply]
B --> C{Waypoint type}
C -->|namespace| D[One waypoint handles<br/>all services in namespace]
C -->|service| E[One waypoint per<br/>specific service]
D --> F[istioctl labels namespace]
E --> G[User manages enrollment]
F --> H[Traffic routed through waypoint]
G --> H
H --> I[Waypoint autoscales<br/>based on traffic]
Waypoints are standard Kubernetes Deployments and can be auto-scaled with HPA. A single waypoint can handle L7 processing for an entire namespace, whereas sidecar mode requires one Envoy per pod.
L4 vs L7 Feature Breakdown
| Feature | L4 (ztunnel only) | L7 (+ waypoint) |
|---|---|---|
| mTLS encryption | Yes | Yes |
| Service identity (SPIFFE) | Yes | Yes |
| Network-based authorization | Yes | Yes |
| TCP metrics | Yes | Yes |
| HTTP routing | No | Yes |
| Traffic splitting / canary | No | Yes |
| JWT / OIDC auth | No | Yes |
| Circuit breaking (HTTP) | No | Yes |
| Retries / timeouts | No | Yes |
| Fault injection | No | Yes |
| Rate limiting | No | Yes |
| Distributed tracing | No | Yes |
| L7 RED metrics | No | Yes |
Performance Benchmarks
Istio Official (Bare Metal, 1KB HTTP/1.1)
| Mode | P90 Latency | P99 Latency |
|---|---|---|
| No mesh | ~0.10ms | ~0.15ms |
| Ambient L4 | ~0.16ms | ~0.20ms |
| Ambient L4+L7 (waypoint) | ~0.40ms | ~0.50ms |
| Sidecar | ~0.63ms | ~0.88ms |
CNCF Bookinfo (GKE, 4000 RPS)
| Mode | Average | P90 | Difference from no mesh |
|---|---|---|---|
| No mesh | 1.54ms | 2.25ms | — |
| Ambient | 1.58ms | 2.33ms | +3-4% (with mTLS + L4 observability) |
Linkerd vs Ambient (GKE, 2000 RPS, 100 connections)
| Service Mesh | P99 Latency |
|---|---|
| Baseline (no mesh) | ~8ms |
| Linkerd | ~12ms |
| Istio Ambient L7 | ~23ms |
| Istio Sidecar | ~175ms |
Ambient significantly closes the gap with lightweight meshes like Linkerd at high loads, while sidecar mode lags substantially. At 200 RPS, ambient was only ~2ms behind Linkerd at P99.
Real-World Cost Savings
Solo.io Cost Analysis
For a typical large deployment (3 clusters, 200 nodes, 15,000 pods, 1,000 namespaces):
| Cost Factor | Sidecar | Ambient (1 waypoint replica) | Ambient (3 waypoint replicas) |
|---|---|---|---|
| Mesh vCPUs | 9,000 | 660 | 1,860 |
| Annual cost | $2,376,000 | $174,240 | $491,040 |
| Annual savings | — | $2,201,760 (92%) | $1,884,960 (79%) |
User-Reported Projected Savings
| Industry | Clusters | Mesh Pods | Annual Savings |
|---|---|---|---|
| Technology Services | 36 | 21,000 | $2.0M-$2.8M |
| Financial Services | 71 | 28,800 | $1.9M |
| Healthcare | 4 | 3,855 | $400K |
| Federal Government | 3 | 16,316 | $2.0M-$3.1M |
An ambient mesh cost savings estimator is available to model your specific infrastructure.
Installing Istio Ambient Mesh (Istio 1.29)
As of Istio 1.29 (February 2026), ambient mode is stable for single-cluster production. Multi-network multi-cluster support is promoted to Beta with significant telemetry and reliability improvements.
# Download Istio 1.29
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.29.0 sh -
export PATH=$PWD/istio-1.29.0/bin:$PATH
# Install with ambient profile
istioctl install --set profile=ambient \
--set "components.ingressGateways[0].enabled=true" -y
# Verify components
kubectl get pods -n istio-system
# NAME READY STATUS
# istiod-xxx 1/1 Running
# ztunnel-xxx (one per node) 1/1 Running
# istio-ingressgateway-xxx 1/1 Running
istioctl verify-install
What Gets Installed
- istiod: Control plane (unchanged from sidecar mode)
- ztunnel: DaemonSet on every node. Handles L4 mTLS, authentication, authorization, and telemetry. Written in Rust for performance
- istio-ingressgateway: Optional, for external traffic entry
What’s New in Istio 1.29 (Feb 2026)
| Feature | Status | Impact |
|---|---|---|
| DNS capture for ambient workloads | GA (default on) | Improved security, service discovery, traffic management |
| iptables reconciliation | GA (default on) | Automatic network rule updates on CNI upgrade — eliminates manual intervention |
| Multi-network multi-cluster ambient | Beta | Cross-cluster telemetry, E/W gateway improvements, peer metadata exchange |
| Certificate Revocation List (CRL) in ztunnel | GA | Validate/reject revoked certs with external CAs |
| Debug endpoint authorization | GA | Namespace-scoped access control for ztunnel debug endpoints |
| Default NetworkPolicies for istiod/ztunnel | GA | global.networkPolicy.enabled=true |
| Wildcard ServiceEntry with DYNAMIC_DNS (TLS) | Alpha | SNI-based routing without TLS termination |
| HTTP compression for Envoy metrics | GA (default on) | brotli/gzip/zstd for Prometheus stats endpoint |
| Baggage-based telemetry for ambient | Alpha | Cross-network traffic source/destination attribution |
| Inference Extension | Beta | Gateway API InferenceExtension for self-hosted AI models |
| Pilot resource filtering | GA | Run Istio as Gateway API-only controller |
GOMEMLIMIT auto-configuration |
GA | istiod auto-sets to 90% of memory limit, reduces OOM risk |
Enabling Ambient Mode for a Namespace
# Enable ambient for the default namespace
kubectl label namespace default istio.io/dataplane-mode=ambient
# Restart workloads to pick up the new data plane
kubectl rollout restart deployment -n default
# Verify ztunnel is handling traffic
kubectl logs -n istio-system daemonset/ztunnel | grep "connection established"
Once labeled, traffic to and from pods in the namespace is automatically captured by ztunnel. No sidecar is injected, no pod restart is required for the initial setup (only for the namespace relabeling).
Deploying Waypoint Proxies (L7 Traffic Management)
# Generate and deploy a waypoint for the default namespace
istioctl waypoint apply --namespace default --enroll-namespace
# Verify waypoint is running
kubectl get pods -n default
# NAME READY STATUS
# waypoint-xxx 1/1 Running
# (your app pods without sidecars)
Apply L7 traffic policies that target the waypoint:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-route
namespace: default
spec:
hosts:
- reviews
http:
- match:
- headers:
end-user:
exact: admin
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
loadBalancer:
simple: RANDOM
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
Verifying mTLS
# Check that ztunnel is handling traffic
istioctl x ztunnel-config workload -n default
# Enforce strict mTLS
kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: default
spec:
mtls:
mode: STRICT
EOF
# Test that unencrypted connections are rejected
kubectl run test --image=curlimages/curl --rm -it --restart=Never -- \
curl -s http://reviews:9080/
# Should fail with connection refused (mTLS required)
Multi-Cluster Ambient Mesh
Multi-cluster traffic management became Alpha in Istio 1.27 (August 2025) and was promoted to Beta in Istio 1.29 (February 2026). This enables active-active high-availability across regions or clouds.
flowchart LR
subgraph Cluster1[Cluster 1 - us-east]
ZT1[ztunnel]
WP1[waypoint]
App1[Application]
end
subgraph Cluster2[Cluster 2 - us-west]
ZT2[ztunnel]
WP2[waypoint]
App2[Application]
end
subgraph CP[Cross-cluster]
IS[Shared istiod control plane]
end
ZT1 <-->|HBONE across E/W gateway| ZT2
IS -.->|Configures| ZT1
IS -.->|Configures| ZT2
Key improvements in the 1.29 Beta:
- Advanced peer metadata exchange: Ensures proper source/destination attribution for cross-network traffic
- L4 metrics now report waypoint info: Previously waypoints were missing from multi-network telemetry
- E/W gateway improvements: Better handling of requests traversing different networks
- Dedicated observability guide: Prometheus and Kiali deployment documentation for multi-cluster ambient
Enable multi-cluster with the multi-primary multi-network guide, and enable baggage-based telemetry via the AMBIENT_ENABLE_BAGGAGE pilot env var for improved telemetry.
Migrating from Sidecar to Ambient (Zero Downtime)
Istio 1.29 supports mixed mode where sidecar-injected and ambient workloads coexist, enabling incremental migration without downtime. The Istio team is investing in dedicated migration tooling to assess readiness and provide rollback-safe transitions.
Step 1: Install Ambient Components
# Add ambient components to existing sidecar mesh
istioctl install --set profile=ambient -y
# ztunnel DaemonSet is added alongside existing sidecars
Step 2: Enable Waypoint (for L7 namespaces)
istioctl waypoint apply --namespace production --enroll-namespace
Step 3: Label Namespace for Ambient
kubectl label namespace production istio.io/dataplane-mode=ambient
Step 4: Remove Sidecars (Rolling Deployment)
kubectl label namespace production istio-injection-
kubectl rollout restart deployment -n production
Pods restart without sidecars and are captured by ztunnel. During the transition, sidecar-to-ambient and ambient-to-sidecar traffic both work via the HBONE protocol.
Step 5: Verify Migration
# Confirm no sidecars remain
kubectl get pods -n production -o jsonpath='{.items[*].spec.containers[*].name}' | grep -c istio-proxy
# Should output 0
# Verify connectivity
istioctl proxy-status
Comparison with Alternatives
Ambient vs Sidecar Mode
| Dimension | Sidecar | Ambient |
|---|---|---|
| Resource cost | High (0.6 vCPU, 60MB per pod) | Low (0.06 vCPU, 12MB per node) |
| Latency (P90) | ~0.63ms | ~0.16ms (L4), ~0.40ms (L7) |
| Deployment | Label + restart all pods | Label namespace, no restart |
| Upgrade | Rolling pod restart | ztunnel rolling update only |
| Security granularity | Per-pod keys | Per-node keys (reduced blast radius) |
| Extensibility | EnvoyFilter, Wasm | Wasm (via waypoint), TrafficExtension API |
| Maturity | Stable, multi-cluster | Stable single-cluster, Beta multi-cluster |
Ambient vs Linkerd
Linkerd remains a strong alternative with lower baseline overhead. Linkerd’s 2025 benchmarks show it leading ambient by ~11ms at P99 at 2000 RPS. However, ambient offers:
- Richer L7 policy model (VirtualService, DestinationRule)
- Gateway API integration
- Larger ecosystem and community (Istio is the most widely adopted service mesh)
- Built-in inference extension for AI workloads
Ambient vs Cilium Service Mesh
Cilium uses eBPF for L3/L4 operations and Envoy for L7. While Cilium’s kernel-based approach is compelling, Istio’s iperf benchmarks show ztunnel outperforming Cilium WireGuard by ~40% in encrypted throughput. Cilium’s strength is in unified networking (CNI + mesh), while Istio’s strength is deep L7 traffic management and multi-cluster support.
AI Inference Extension (Beta in Istio 1.29)
Istio 1.29 promotes the Gateway API Inference Extension to Beta. This allows Kubernetes Gateway API objects to optimize routing for self-hosted AI inference workloads:
- Uses a new
InferencePoolCRD - Integrates with existing
GatewayandHTTPRouteobjects - Enables intelligent request routing across inference replicas
- Conformant with Gateway API Inference Extension v1.0.1
Enable it with ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true pilot env var.
Security Model and NIST Guidance
The NIST SP 800-233 standard provides official guidance on when to use sidecar vs ambient mode. Key security differences:
- Reduced blast radius: In sidecar mode, a compromised pod gives the attacker access to mesh keys stored in that pod. In ambient mode, keys are stored per-node in ztunnel — compromising one pod does not expose mesh credentials
- Compromised application pod gives access to mesh keys: Sidecar — Yes; Ambient — No
- Stronger isolation: Applications cannot bypass the proxy (in sidecar mode, a compromised container can disable the sidecar; in ambient mode, traffic is captured at the node level)
- CRL support in 1.29: ztunnel can now validate and reject revoked certificates when using external certificate authorities
Troubleshooting Migration
| Symptom | Cause | Fix |
|---|---|---|
| Sidecar→ambient connections rejected with STRICT mTLS | Identity/policy mismatch | Apply PERMISSIVE PeerAuthentication during migration, then switch to STRICT |
| Ambient service returns 503s | No waypoint for L7 traffic, or waypoint scaled to zero | Deploy waypoint: istioctl waypoint apply --namespace <ns> |
| Missing HTTP metrics after migration | ztunnel only exports L4 metrics | Deploy waypoint for L7 observability |
| Waypoint autoscaling delays | Waypoint scaled to zero, cold start | Set minimum replicas: kubectl scale deployment waypoint -n <ns> --replicas=2 |
| Session affinity not working | DestinationRule ConsistentHash not fully implemented in waypoint | Use Gateway API for session persistence, or keep sidecar for affected services |
| Multi-cluster telemetry incomplete | Baggage-based telemetry not enabled | Set AMBIENT_ENABLE_BAGGAGE=true in istiod env |
| Ambient workloads missing after CNI upgrade | iptables not reconciled automatically | Upgrade to Istio 1.29+ (iptables reconciliation is now default) |
Resources
- Istio Ambient Mesh Documentation — Official setup and migration guides
- Istio 1.29 Release Notes — Multicluster beta, DNS capture, CRL support
- Sidecar or Ambient? — Official feature comparison table
- Istio Performance and Scalability — Official latency benchmarks
- Istio: The Highest-Performance Solution for Network Security — March 2025 iperf benchmarks
- Istio Roadmap 2025-2026 — Migration tooling, multi-cluster, extensibility plans
- How Ambient Mesh Delivers Cost Savings — Solo.io cost analysis
- Ambient Mesh Cost Savings Estimator — Model your infrastructure costs
- Linkerd vs Ambient Mesh 2025 Benchmarks — Independent performance comparison
- NIST SP 800-233: Service Mesh Proxy Models — Security guidance for sidecar vs ambient
Comments