Introduction
API gateways have become the cornerstone of modern microservices architectures. As organizations transition from monolithic applications to distributed systems, the need for a unified entry point that handles cross-cutting concerns has become critical. In 2026, the ecosystem is undergoing major shifts: the Kubernetes Ingress-NGINX project retired in March 2026, replaced by the Gateway API standard; the Model Context Protocol (MCP) is reshaping how AI agents connect to tools; and open-source gateways are evolving from traffic proxies into full API management platforms.
This guide explores API gateway patterns, covering core concepts, architectural considerations, implementation strategies, the latest standards, and the open-source landscape.
Understanding API Gateways
What is an API Gateway?
An API gateway is a server that acts as the single entry point for a defined group of microservices. It handles requests by routing them to the appropriate backend service, aggregating results, and handling cross-cutting concerns like authentication, rate limiting, and logging.
flowchart TB
Client["Client"] -->|HTTPS| Gateway["API Gateway"]
Gateway -->|Authentication| Auth["Auth Provider"]
Gateway -->|Rate Limiting| Redis["Redis"]
Gateway -->|Routing| US["User Service"]
Gateway -->|Routing| OS["Order Service"]
Gateway -->|Routing| PS["Product Service"]
Gateway -->|Routing| AD["Admin Service"]
Why Use an API Gateway?
The benefits of implementing an API gateway include:
- Unified Entry Point - Single URL for all microservices
- Cross-Cutting Concerns - Centralized handling of authentication, logging
- Protocol Translation - Convert between protocols (REST to gRPC)
- Client Simplification - Clients interact with one endpoint
- Analytics - Centralized request/response monitoring
- Security - Protection against attacks and abuse
Core Gateway Patterns
Request Routing
API gateways route requests based on URL paths, methods, headers, or other attributes. This is the most fundamental pattern — every gateway implements it.
Route requests to different backend services based on URL path prefixes:
services:
- name: user-service
url: http://user-service:8080
routes:
- name: user-route
paths:
- /api/v1/users
methods:
- GET
- POST
strip_path: true
- name: order-service
url: http://order-service:8080
routes:
- name: order-route
paths:
- /api/v1/orders
methods:
- GET
- POST
- PUT
strip_path: true
Route traffic based on request headers like tenant ID or API version:
routes:
- name: tenant-routing
paths:
- /api
headers:
X-Tenant-Id:
- tenant-1
- tenant-2
url: http://tenant-service-1:8080
Traffic Splitting (Canary Deployments)
Traffic splitting routes a percentage of requests to different upstream versions, enabling canary deployments and blue-green switches. The Kubernetes Gateway API makes this a first-class feature — no annotations required.
Split traffic between stable and canary versions using weighted backends:
rules:
- backendRefs:
- name: payments-v2
port: 8080
weight: 90
- name: payments-v1
port: 8080
weight: 10
API Composition (Aggregation)
The API composition pattern combines responses from multiple microservices into a single response. Instead of requiring a client to make five separate API calls to render a dashboard, the gateway fetches data from all services in parallel and returns a unified response.
sequenceDiagram
participant C as Client
participant G as API Gateway
participant US as User Service
participant OS as Order Service
participant PS as Product Service
C->>G: GET /dashboard
par Fetch in parallel
G->>US: GET /users/profile
G->>OS: GET /orders/recent
G->>PS: GET /products/featured
end
US-->>G: {user data}
OS-->>G: {order data}
PS-->>G: {product data}
G-->>C: {aggregated dashboard}
Backend for Frontend (BFF)
The BFF pattern creates gateway configurations tailored to specific client types. A mobile BFF provides compact responses optimized for bandwidth constraints. A web BFF returns richer data structures suited to desktop layouts. Netflix, Spotify, and SoundCloud have publicly documented their BFF implementations.
routes:
- name: mobile-api
paths:
- /mobile/v1
url: http://mobile-bff:8080
plugins:
- name: response-transformer
config:
remove:
headers:
- X-Internal-Fields
- name: web-api
paths:
- /web/v1
url: http://web-bff:8080
Authentication and Authorization
JWT Authentication
Verify JSON Web Tokens at the gateway level so individual services don’t need to re-validate credentials:
plugins:
- name: jwt
config:
header_names:
- Authorization
claims_to_verify:
- exp
maximum_expiration: 3600
OAuth2 Integration
Delegate authorization to an external OAuth2 provider. The gateway validates tokens and enforces required scopes before forwarding requests:
plugins:
- name: oauth2
config:
scopes:
- read
- write
mandatory_scope: true
token_endpoint: https://auth.example.com/oauth/token
Role-Based Access Control
Enforce access rules by consumer group after authentication succeeds:
plugins:
- name: acl
config:
allow:
- group-admin
- group-user
deny: []
Traffic Management
Rate Limiting
Rate limiting protects backend services from abuse and ensures fair usage across consumers. Gateways typically support per-second, per-minute, or per-hour limits backed by Redis for distributed counting.
Configure rate limits with Redis-backed distributed counting:
plugins:
- name: rate-limiting
config:
minute: 100
hour: 1000
policy: redis
redis_host: redis-host
redis_port: 6379
fault_tolerant: true
NGINX uses shared memory zones and the limit_req directive for rate limiting:
http {
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
server {
location /api/ {
limit_req zone=api_limit burst=20 nodelay;
proxy_pass http://backend;
}
}
}
The token bucket algorithm controls burst traffic by accumulating unused tokens up to a capacity:
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate
self.last_refill = time.time()
def consume(self, tokens=1):
self._refill()
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
def _refill(self):
now = time.time()
self.tokens = min(self.capacity, self.tokens + (now - self.last_refill) * self.refill_rate)
self.last_refill = now
Circuit Breaking
Circuit breaking prevents cascading failures by stopping requests to an unhealthy upstream. When error rates exceed a threshold, the circuit opens and the gateway returns a fast failure. After a cooldown, it allows test requests through in a half-open state.
stateDiagram-v2
[*] --> Closed
Closed --> Open: failures > threshold
Open --> HalfOpen: timeout elapsed
HalfOpen --> Closed: probe succeeds
HalfOpen --> Open: probe fails
class CircuitBreaker:
def __init__(self, threshold=5, timeout=30):
self.state = "CLOSED"
self.failure_count = 0
self.threshold = threshold
self.timeout = timeout
self.last_failure = 0
def call(self, operation):
if self.state == "OPEN":
if time.time() - self.last_failure > self.timeout:
self.state = "HALF_OPEN"
else:
raise Exception("circuit open")
try:
result = operation()
self.failure_count = 0
self.state = "CLOSED"
return result
except Exception:
self.failure_count += 1
self.last_failure = time.time()
if self.failure_count >= self.threshold:
self.state = "OPEN"
raise
Canary Deployment
Canary deployments route a small percentage of traffic to a new service version while the majority hits the stable version. The gateway controls the traffic split, enabling teams to validate releases with real traffic before committing to a full rollout.
Route 5% of traffic to a canary instance for gradual rollout validation:
upstreams:
- name: user-service
targets:
- target: user-service-stable:8080
weight: 95
- target: user-service-canary:8080
weight: 5
Service Discovery
In dynamic microservices environments, service instances scale up and down frequently. Service discovery lets the gateway automatically detect healthy instances without manual configuration. Apache APISIX integrates with Consul, Eureka, Nacos, and Kubernetes-native DNS. Envoy uses the xDS API for dynamic endpoint discovery.
Caching Strategies
Response Caching
Cache GET responses at the gateway to reduce backend load and improve latency. Configure TTL and backend storage (local memory or Redis for distributed deployments).
Cache API responses with a 5-minute TTL using Redis for distributed cache coherence:
plugins:
- name: response-cache
config:
request_method:
- GET
- HEAD
response_code:
- 200
cache_ttl: 300
NGINX uses proxy_cache directives for local disk-based caching:
proxy_cache_path /var/cache/nginx levels=1:2
keys_zone=api_cache:10m
max_size=100m
inactive=60m;
server {
location /api/ {
proxy_cache api_cache;
proxy_cache_valid 200 5m;
proxy_cache_use_stale error timeout;
proxy_pass http://backend;
}
}
Protocol Translation
Gateways can translate between protocols, allowing clients using HTTP to communicate with gRPC backends and vice versa.
REST to gRPC Transcoding
Translate HTTP/JSON requests to gRPC calls by defining the mapping in the proto file:
// proto annotations map HTTP endpoints to gRPC methods
service UserService {
rpc GetUser (GetUserRequest) returns (User) {
option (google.api.http) = {
get: "/v1/users/{id}"
};
}
}
WebSocket Support
Gateways also proxy persistent connections like WebSocket for real-time features:
server {
location /ws/ {
proxy_pass http://websocket-backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400;
}
}
Service Mesh Integration
Gateway vs Service Mesh
API gateways and service meshes serve different traffic directions. Gateways handle north-south traffic (external clients into the cluster), while service meshes handle east-west traffic (service-to-service within the cluster).
| Aspect | API Gateway | Service Mesh |
|---|---|---|
| Traffic direction | North-south (external → internal) | East-west (internal → internal) |
| Deployment | Edge proxy | Sidecar proxies |
| Primary focus | API management, external auth | Internal networking, mTLS |
| Authentication | API keys, JWT, OAuth | mTLS (identity-based) |
| Rate limiting | Per-consumer, per-route | Per-service (less granular) |
| Protocol support | HTTP, gRPC, WebSocket, GraphQL | TCP, HTTP, gRPC |
| Examples | Kong, NGINX, Envoy, APISIX | Istio, Linkerd, Consul Connect |
Combined Architecture
Deploy both together — the gateway at the edge for external concerns, the mesh internally for service connectivity:
Route external traffic through the gateway to internal services with Istio’s VirtualService:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-service
spec:
hosts:
- my-service.example.com
gateways:
- my-gateway
http:
- match:
- uri:
prefix: /api/v1
route:
- destination:
host: my-service
port:
number: 8080
Standards & Ecosystem (2026)
Kubernetes Gateway API
The Kubernetes Gateway API is the official successor to Ingress, reaching GA in 2025. The Ingress-NGINX project retired in March 2026, making migration a priority for all Kubernetes teams. The ingress2gateway v1.0 migration tool helps translate existing Ingress configurations to Gateway API resources.
Gateway API introduces a role-oriented model with three resource types:
- GatewayClass — Cluster-scoped, owned by infrastructure teams. Defines the implementation (Cilium, Istio, Envoy Gateway).
- Gateway — Namespace-scoped, owned by platform teams. Instantiates a load balancer with listeners and TLS.
- HTTPRoute / GRPCRoute / TCPRoute — Namespace-scoped, owned by application teams. Define how traffic reaches services.
Define a Gateway and attach an HTTPRoute to route traffic to the backend service:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: production-gateway
namespace: networking
spec:
gatewayClassName: envoy
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
certificateRefs:
- name: wildcard-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api-route
spec:
parentRefs:
- name: production-gateway
namespace: networking
hostnames:
- "api.example.com"
rules:
- backendRefs:
- name: api-service
port: 8080
Gateway API also supports Policy Attachment — a standardized way to extend behavior (rate limiting, circuit breaking, timeouts) through policy resources like BackendTLSPolicy, replacing the annotation sprawl of Ingress-NGINX.
MCP and AI Gateways
The Model Context Protocol (MCP), introduced by Anthropic in late 2024 and now governed by the Agentic AI Foundation under the Linux Foundation (140+ member organizations), has become the standard for AI agent-to-tool communication. MCP enables AI agents to dynamically discover and invoke tools through a standardized JSON-RPC protocol.
This has given rise to AI gateways — specialized API gateways that manage LLM traffic with token-based rate limiting, model routing, cost controls, and content safety policies. Kong added MCP proxy plugins in Gateway 3.12. Apache APISIX provides an ai-proxy plugin for multi-provider LLM routing. Red Hat introduced an MCP gateway as a technology preview in Connectivity Link.
AI gateways and traditional API gateways are converging. The same infrastructure that routes REST and gRPC traffic now handles MCP and LLM API calls, applying consistent authentication, rate limiting, and observability policies across all traffic types.
Open-Source Gateway Landscape (2026)
Comparison of Major Projects
The open-source API gateway market in 2026 is dominated by five major projects, each with distinct architectural decisions and sweet spots.
| Feature | Kong | Apache APISIX | Envoy | Tyk | Traefik |
|---|---|---|---|---|---|
| Language | Lua/Go (OpenResty) | Lua/Go (OpenResty) | C++ | Go | Go |
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 | MPL 2.0 | MIT |
| Plugin count | 60+ (OSS), 200+ (Enterprise) | 100+ built-in | ~30 HTTP filters | ~30 middlewares | ~30 middlewares |
| Plugin languages | Lua, Go, Python, JS | Lua, Go, Java, Python, Wasm | C++, Wasm | Go, Python, JS | Go (Yaegi) |
| Configuration store | PostgreSQL/Cassandra | etcd | xDS API (control plane) | Dashboard + API | File / KV stores |
| Dynamic config | Partial (DB polling) | Sub-millisecond (etcd push) | Yes (xDS push) | Yes | Yes (provider watch) |
| Dashboard | Kong Manager (Enterprise) | Built-in | None (third-party) | Built-in | Built-in |
| Kubernetes support | Ingress Controller | Ingress Controller + Gateway API | Gateway API (Envoy Gateway) | Operator | CRDs + auto-discovery |
| gRPC | Native | Native | Native | Native | Native |
| HTTP/3 (QUIC) | Experimental | Supported | Supported | Not supported | Supported |
| Service mesh | Kong Mesh (Envoy-based) | No | Istio / native | No | No |
When to Choose Each
-
Kong — Mature ecosystem, largest plugin library, proven at scale. Best for enterprise environments needing commercial support and a wide third-party integration ecosystem. Lua plugin model can be a barrier.
-
Apache APISIX — High performance, dynamic configuration without restarts, multi-language plugin support (Lua, Go, Java, Python, Wasm), no database dependency on the data path (etcd-based). Strong choice for platform teams building API infrastructure.
-
Envoy — Highest raw performance (C++), CNCF graduated project. Excels as a service mesh data plane. Not a batteries-included API management platform — requires Envoy Gateway or Istio for a full gateway solution.
-
Tyk — Full API management stack in open source (gateway + dashboard + developer portal + analytics). Go-based for easy extensibility. Multi-tenancy in the OSS edition. Strong GraphQL support.
-
Traefik — Automatic service discovery, built-in Let’s Encrypt, container-native. Excellent for Docker-native and small-to-medium Kubernetes environments. Limited advanced API management.
Cloud-Native Implementations
Kubernetes Ingress and Gateway API
Kubernetes Ingress is the most basic gateway abstraction. With Ingress-NGINX now retired, teams should target Gateway API implementations. Below is the classic Ingress format for reference:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-gateway
annotations:
nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
rules:
- host: api.example.com
http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 80
Kong on Kubernetes
Kong’s Kubernetes integration uses CRDs to configure plugins declaratively alongside your services:
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: rate-limit-plugin
config:
minute: 100
policy: local
plugin: rate-limiting
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: user-ingress
annotations:
konghq.com/plugins: rate-limit-plugin
spec:
rules:
- http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 8080
Observability
Request Logging
Log structured request data at the gateway for centralized debugging and audit trails:
plugins:
- name: logging
config:
log_level: info
log_format:
request:
uri: $request_uri
method: $request_method
response:
status: $status
latency:
request: $request_time_ms
upstream: $upstream_connect_time_ms
Distributed Tracing
Inject OpenTelemetry trace context at the gateway to track requests across all downstream services:
plugins:
- name: opentelemetry
config:
endpoint: otel-collector:4318
service_name: api-gateway
Best Practices
Security
- Use TLS everywhere - Encrypt all traffic between clients and the gateway
- Implement mTLS - Encrypt and authenticate service-to-service communication
- Validate tokens at the gateway - Reject invalid JWTs before they reach services
- Rate limit aggressively - Prevent abuse at multiple tiers (per-IP, per-consumer, per-route)
- Log all requests - Maintain an audit trail for debugging and compliance
Performance
- Enable caching - Reduce backend load for idempotent GET requests
- Use connection pooling - Reuse upstream connections to reduce latency
- Compress responses - Enable gzip/brotli compression at the gateway
- Set sensible timeouts - Prevent hung connections from consuming resources
Reliability
- Implement circuit breakers - Stop cascading failures when upstreams degrade
- Use retries with backoff - Handle transient errors without overwhelming backends
- Deploy multiple gateway replicas - Eliminate single points of failure
- Monitor active connections - Scale gateways before connection pools exhaust
Conclusion
API gateways remain essential for modern microservices architectures. In 2026, the landscape is shaped by the rise of Kubernetes Gateway API (replacing Ingress), the convergence of AI and traditional API traffic through MCP, and the maturation of open-source projects with distinct strengths.
The right API gateway depends on your requirements: Kong for enterprise ecosystems, APISIX for performance and dynamic config, Envoy for service mesh foundations, Tyk for a full OSS management stack, and Traefik for container-native simplicity. Evaluate based on your scale, team expertise, and long-term architectural vision.
Resources
- Kubernetes Gateway API
- Ingress2Gateway Migration Tool
- Kong Gateway Documentation
- Apache APISIX Documentation
- Envoy Proxy
- Tyk Gateway
- Traefik Proxy
- Model Context Protocol (MCP)
- Red Hat MCP Gateway
Comments