Introduction
API gateways sit between clients and backend services, handling cross-cutting concerns so individual services stay focused on business logic. Choosing the right gateway architecture and tooling directly shapes your system’s scalability, security posture, and team autonomy. This article covers five deployment patterns, six gateway tools with working configuration, authentication and rate-limiting internals, and a decision framework for picking the right approach.
Gateway Architecture Patterns
Single Gateway (Monolithic Gateway)
One gateway instance handles all traffic for every client and service. This pattern works well for small teams and early-stage products where operational simplicity outweighs isolation.
Pros: Single configuration surface, unified authentication policy, one place to manage SSL and routing.
Cons: Team coordination bottlenecks, any service change risks breaking others, gateway becomes a single point of failure and a deployment scaling bottleneck.
Use this pattern during the first twelve months of a project or when the team size stays under ten engineers.
Gateway per Team
Each product team operates its own gateway instance. The gateway exposes only the routes and policies that team controls.
# team-billing-gateway.yaml - Kong declarative config for a per-team gateway
_format_version: "3.0"
services:
- name: billing-service
host: billing.internal
port: 8080
protocol: http
routes:
- name: billing-routes
paths:
- /api/v1/billing
methods: [GET, POST, PUT]
plugins:
- name: key-auth
- name: rate-limiting
config:
minute: 100
policy: local
- name: invoice-service
host: invoice.internal
port: 8080
protocol: http
routes:
- name: invoice-routes
paths:
- /api/v1/invoices
methods: [GET]
plugins:
- name: key-auth
Pros: Teams deploy independently, route changes stay scoped, blast radius shrinks to one team.
Cons: More gateways to operate, duplicated cross-cutting config (observability, CORS), inconsistent authentication across teams without shared libraries.
Apply this pattern when your organization has three or more independent product teams with separate deployment cadences.
Gateway per Domain
You deploy one gateway per bounded context or domain — payments, inventory, user management each get their own gateway aligned with the domain boundary.
This mirrors domain-driven design: the gateway enforces the domain boundary and encapsulates internal service topology. External clients interact with domain gateways, never with individual services.
# Envoy config for domain gateway (inventory domain)
static_resources:
listeners:
- name: inventory-listener
address: { socket_address: { address: 0.0.0.0, port_value: 8081 } }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: inventory_gw
route_config:
name: inventory_routes
virtual_hosts:
- name: inventory
domains: ["inventory.api.example.com"]
routes:
- match: { prefix: "/stock" }
route: { cluster: stock_service }
- match: { prefix: "/warehouse" }
route: { cluster: warehouse_service }
http_filters:
- name: envoy.filters.http.jwt_authn
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
providers:
auth0:
issuer: https://auth.example.com/
audiences: ["inventory-api"]
remote_jwks:
http_uri: { uri: https://auth.example.com/.well-known/jwks.json, cluster: auth_cluster }
- name: envoy.filters.http.router
clusters:
- name: stock_service
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: stock_service
endpoints:
- lb_endpoints:
- endpoint:
address: { socket_address: { address: stock-svc, port_value: 8080 } }
Pros: Strong domain isolation, routing logic matches business boundaries, teams own their domain end-to-end.
Cons: Cross-domain workflows require calls between gateways, more infrastructure to maintain, potential duplication of shared logic.
BFF (Backend for Frontend)
Each client type — web, mobile, IoT, third-party — gets a dedicated gateway tailored to its specific needs. Mobile clients need payload minimization and offline tolerance; web clients need session management and server-side rendering support.
// BFF gateway handler for a mobile client — aggregates profile, orders, and notifications
// into a single mobile-optimized response
app.get("/mobile/dashboard", async (req, res) => {
const [profile, orders, notifications] = await Promise.all([
fetch("http://user-service/profile", { headers: { authorization: req.headers.authorization } }),
fetch("http://order-service/recent", { headers: { authorization: req.headers.authorization } }),
fetch("http://notify-service/unread", { headers: { authorization: req.headers.authorization } }),
]);
const [profileData, ordersData, notificationsData] = await Promise.all([
profile.json(), orders.json(), notifications.json(),
]);
res.json({
user: { name: profileData.name, avatar: profileData.avatar },
recent_orders: ordersData.slice(0, 5),
unread_count: notificationsData.count,
});
});
Pros: Per-client optimization without leaking client concerns into general-purpose APIs, reduced mobile payload, independent scaling per client type.
Cons: BFF proliferation — each new client type adds a gateway, duplication of shared logic like authentication, maintenance burden grows linearly with client count.
Sidecar Gateway Pattern
Each service instance runs a gateway sidecar (typically Envoy or a lightweight proxy) that handles ingress traffic for that service. The sidecar enforces service-level policies — rate limits, authentication, TLS — without a centralized gateway.
This pattern naturally emerges in service mesh architectures like Istio, where each pod runs an Envoy sidecar as the data plane.
# Kubernetes sidecar injection annotation for Envoy
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
annotations:
sidecar.istio.io/inject: "true"
spec:
replicas: 3
selector:
matchLabels:
app: payment-service
template:
metadata:
labels:
app: payment-service
spec:
containers:
- name: payment-app
image: payment-service:1.4.2
ports:
- containerPort: 8080
---
# Istio AuthorizationPolicy enforced at the sidecar
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: payment-authz
spec:
selector:
matchLabels:
app: payment-service
rules:
- from:
- source:
principals: ["cluster.local/ns/billing/sa/billing-sa"]
to:
- operation:
methods: ["POST"]
paths: ["/api/v1/charges"]
Pros: Fine-grained per-service policy, no single point of failure, integrates with mTLS and observability across the mesh.
Cons: Operational complexity of running a sidecar per pod, resource overhead (CPU/memory for each proxy), debugging across many proxies is harder than a centralized gateway.
Gateway Tool Deep Dives
Kong (Open Source / Enterprise)
Kong runs on OpenResty (NGINX + Lua) and provides a plugin ecosystem for authentication, rate limiting, logging, and transformation. It supports both DB-backed and DB-less (declarative) modes.
Use the declarative mode for GitOps workflows: commit your kong.yaml, apply it via deck gateway sync, and Kong reloads without downtime.
# kong-declarative.yaml — full gateway config for a multi-service setup
_format_version: "3.0"
_transform: true
services:
- name: user-service
url: http://user-svc:8080
routes:
- name: user-routes
paths: ["/api/v1/users"]
methods: [GET, POST, PATCH]
plugins:
- name: oauth2
config:
scopes: ["profile", "email"]
provision_key: "your-provision-key"
enable_authorization_code: true
token_expiration: 7200
- name: rate-limiting
config:
minute: 60
policy: redis
redis_host: redis-cluster
- name: cors
config:
origins: ["https://app.example.com"]
methods: [GET, POST, PATCH]
headers: ["Authorization", "Content-Type"]
- name: order-service
url: http://order-svc:8080
routes:
- name: order-routes
paths: ["/api/v1/orders"]
methods: [GET, POST]
plugins:
- name: jwt
config:
claims_to_verify: ["exp", "nbf"]
secret_is_base64: false
- name: acl
config:
allow: ["admin", "support"]
Traefik
Traefik auto-discovers services from Docker, Kubernetes, Consul, and other providers. It supports automatic HTTPS via Let’s Encrypt and provides a middleware-based pipeline for rate limiting, circuit breaking, and retries.
# Traefik dynamic config with middleware pipeline
http:
routers:
api-router:
rule: "Host(`api.example.com`) && PathPrefix(`/v1`)"
entryPoints:
- websecure
middlewares:
- rate-limit@file
- circuit-breaker@file
- jwt-auth@file
service: backend-service
tls:
certResolver: letsencrypt
middlewares:
rate-limit:
rateLimit:
average: 100
burst: 50
period: 1m
sourceCriterion:
ipStrategy:
depth: 1
circuit-breaker:
circuitBreaker:
expression: "LatencyAtQuantileMS(50.0) > 100 || ResponseCodeRatio(500, 600, 0, 600) > 0.05"
checkPeriod: 10s
fallbackDuration: 30s
recoveryDuration: 60s
jwt-auth:
forwardAuth:
address: "http://auth-service:8080/validate"
trustForwardHeader: true
authResponseHeaders:
- X-Auth-User
- X-Auth-Roles
Envoy Proxy
Envoy provides high-performance L3/L4/L7 proxying with advanced load balancing, circuit breaking, and observability. It uses a discovery-driven xDS API for dynamic configuration, making it the data plane of choice for service meshes like Istio.
Key strengths: highly performant C++ core, rich load-balancing strategies (maglev, ring hash, consistent hashing), HTTP/2 and gRPC native support, distributed tracing with OpenTelemetry.
# Envoy cluster config with advanced circuit breaking
static_resources:
clusters:
- name: payment_cluster
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: LEAST_REQUEST
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_pending_requests: 500
max_requests: 2000
max_retries: 5
outlier_detection:
consecutive_5xx: 5
interval: 30s
base_ejection_time: 30s
max_ejection_percent: 50
AWS API Gateway
A fully managed gateway that integrates with Lambda, ALB, and private VPC endpoints. It supports REST, HTTP, and WebSocket APIs. The pricing model charges per million API calls plus data transfer, making it cost-effective at low to moderate traffic levels.
# SAM template for API Gateway with usage plans and Lambda authorizer
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Resources:
PaymentApi:
Type: AWS::Serverless::Api
Properties:
StageName: prod
Auth:
DefaultAuthorizer: JwtAuthorizer
Authorizers:
JwtAuthorizer:
FunctionArn: !GetAtt JwtValidationFunction.Arn
Identity:
Headers:
- Authorization
MethodSettings:
- ResourcePath: "/*"
HttpMethod: "*"
ThrottlingRateLimit: 1000
ThrottlingBurstLimit: 500
DataTraceEnabled: true
LoggingLevel: INFO
ChargeFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: src/
Handler: charges.handler
Events:
ChargeAPI:
Type: Api
Properties:
RestApiId: !Ref PaymentApi
Path: /charges
Method: POST
UsagePlan:
Type: AWS::ApiGateway::UsagePlan
Properties:
ApiStages:
- ApiId: !Ref PaymentApi
Stage: prod
Throttle:
RateLimit: 1000
BurstLimit: 500
Quota:
Limit: 100000
Period: MONTH
Azure API Management
Provides a managed gateway with a rich policy engine expressed in XML. Supports multi-region deployment, developer portal, and API product management.
<!-- Azure API Management policy — JWT validation + rate limiting -->
<policies>
<inbound>
<base />
<validate-jwt
header-name="Authorization"
failed-validation-httpcode="401"
failed-validation-error-message="Unauthorized">
<openid-config url="https://login.microsoftonline.com/tenant/v2.0/.well-known/openid-configuration" />
<required-claims>
<claim name="aud" match="all">
<value>api://my-api</value>
</claim>
</required-claims>
</validate-jwt>
<rate-limit calls="200" renewal-period="60" />
<set-header name="X-Forwarded-By" exists-action="override">
<value>APIM</value>
</set-header>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
<set-header name="X-RateLimit-Remaining" exists-action="override">
<value>@(context.Deployment.RateLimit.Remaining)</value>
</set-header>
</outbound>
</policies>
NGINX / NGINX Plus
NGINX acts as a high-performance reverse proxy with a mature configuration model. NGINX Plus adds native health checks, active session draining, and status monitoring.
# NGINX gateway config — routing, rate limiting, and canary deployment
upstream payment-api-v1 {
server payment-v1-1:8080 weight=100;
server payment-v1-2:8080 weight=100;
}
upstream payment-api-v2 {
server payment-v2-1:8080 weight=10;
server payment-v2-2:8080 weight=10;
}
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=50r/s;
limit_conn_zone $binary_remote_addr zone=perip:10m;
split_clients "app_version_${remote_addr}" $canary_upstream {
10% payment-api-v2;
* payment-api-v1;
}
server {
listen 443 ssl;
server_name api.example.com;
ssl_certificate /etc/ssl/certs/example.crt;
ssl_certificate_key /etc/ssl/private/example.key;
location /api/v1/payments {
limit_req zone=api_limit burst=100 nodelay;
limit_conn perip 20;
proxy_pass http://$canary_upstream;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Request-ID $request_id;
proxy_next_upstream error timeout http_500 http_502;
proxy_next_upstream_tries 3;
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
}
location /healthz {
return 200 "OK";
add_header Content-Type text/plain;
}
}
Authentication Flows
JWT Validation at the Gateway
The gateway validates JWT tokens before forwarding requests to backend services. It checks the signature using the issuer’s JWKS endpoint, verifies expiration (exp) and not-before (nbf), and optionally enforces custom claims.
Envoy handles this natively via the JWT authentication filter. Kong uses the jwt or oauth2 plugin. Offloading JWT validation to the gateway removes token-handling logic from every service.
# Kong JWT validation plugin config
plugins:
- name: jwt
service: order-service
config:
uri_param_names: [jwt]
key_claim_name: kid
secret_is_base64: false
claims_to_verify: [exp, nbf]
anonymous: null
run_on_preflight: true
OAuth2 Token Exchange
For cross-service delegation, the gateway can exchange a client’s access token for a scoped-down token using the OAuth2 token exchange flow. The backend receives only the permissions it needs (least privilege).
# Token exchange request at the gateway
$ curl -X POST https://gateway.example.com/auth/token-exchange \
-H "Authorization: Bearer $USER_TOKEN" \
-d "grant_type=urn:ietf:params:oauth:grant-type:token-exchange" \
-d "audience=payment-service" \
-d "requested_token_type=urn:ietf:params:oauth:token-type:access_token" \
-d "scope=payment:read payment:write"
mTLS Between Gateways and Services
Mutual TLS ensures that both the gateway and the backend service authenticate each other. The gateway presents a client certificate, and the service validates it. This prevents unauthorized clients from reaching internal services even if they bypass the gateway.
# Gateway mTLS configuration with NGINX
$ openssl req -x509 -newkey rsa:4096 -keyout gateway-key.pem \
-out gateway-cert.pem -days 365 -nodes \
-subj "/CN=gateway.example.com"
# Configure NGINX to use client cert when connecting upstream
# proxy_ssl_certificate /etc/ssl/certs/gateway-cert.pem;
# proxy_ssl_certificate_key /etc/ssl/private/gateway-key.pem;
# proxy_ssl_verify on;
# proxy_ssl_trusted_certificate /etc/ssl/certs/ca-cert.pem;
Rate Limiting Algorithms
Token Bucket
Each client receives a bucket that fills at a fixed rate (the refill rate) up to a maximum capacity (the burst). Each request consumes one token. If the bucket is empty, the request is rejected.
class TokenBucket:
def __init__(self, capacity: int, refill_rate: float):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate
self.last_refill = time.monotonic()
def allow_request(self) -> bool:
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
Kong’s rate-limiting plugin supports policy: local for token bucket in-memory and policy: redis for distributed token buckets.
Sliding Window
Instead of a fixed window (which can allow bursts at window boundaries), the sliding window algorithm uses a window of the last N seconds, weighting older requests proportionally.
from collections import deque
class SlidingWindow:
def __init__(self, window_size: float = 60.0, max_requests: int = 100):
self.window_size = window_size
self.max_requests = max_requests
self.timestamps: deque[float] = deque()
def allow_request(self) -> bool:
now = time.monotonic()
cutoff = now - self.window_size
while self.timestamps and self.timestamps[0] < cutoff:
self.timestamps.popleft()
if len(self.timestamps) < self.max_requests:
self.timestamps.append(now)
return True
return False
Distributed Rate Limiting
When you run multiple gateway instances, in-memory rate limiting is insufficient — a client can exhaust the limit across instances. Distributed rate limiting uses Redis or a similar store to maintain a shared counter.
import redis.asyncio as aioredis
class DistributedRateLimiter:
def __init__(self, redis_url: str = "redis://redis-cluster:6379"):
self.redis = aioredis.from_url(redis_url)
async def allow(self, key: str, max_requests: int = 100, window: int = 60) -> bool:
now = int(time.monotonic())
window_key = f"ratelimit:{key}:{now // window}"
count = await self.redis.incr(window_key)
if count == 1:
await self.redis.expire(window_key, window + 1)
return count <= max_requests
Request and Response Transformation
Gateways rewrite requests and responses to decouple client contracts from backend implementations. Common transformations:
| Transformation | Purpose | Example Tool |
|---|---|---|
| Header injection | Add tracing headers, API version | Kong request-transformer, NGINX proxy_set_header |
| Path rewriting | Map external routes to internal services | Traefik stripPrefixRegex, NGINX rewrite |
| JSON-to-XML conversion | Support legacy XML clients | Kong response-transformer |
| Payload stripping | Remove internal fields from responses | Kong response-transformer with remove.json |
| Protocol bridging | HTTP to gRPC transcoding | Envoy grpc-json-transcoder |
# Kong request-transformer plugin — rewrite and inject headers
plugins:
- name: request-transformer
config:
add:
headers:
- X-Request-ID:{uuid}
- X-API-Version:2026-04
remove:
headers:
- X-Internal-Token
append:
querystring:
- source:gateway
Circuit Breaking
Circuit breakers prevent cascading failures by stopping requests to a degraded backend. The breaker transitions through three states: closed (normal operation), open (requests fail fast), and half-open (probing recovery).
Envoy circuit breaker thresholds:
max_connections: maximum simultaneous connections to each upstream hostmax_pending_requests: maximum queued requests when connections are saturatedmax_requests: maximum parallel requests across all connectionsmax_retries: maximum retries allowed per request
# Kong proxy-cache + circuit breaker custom plugin config
# Circuit breaker behavior via upstream health checks
upstreams:
- name: payment-upstream
algorithm: consistent-hashing
healthchecks:
active:
type: http
http_path: /health
healthy:
interval: 5
successes: 3
unhealthy:
interval: 5
http_failures: 5
tcp_failures: 3
timeouts: 3
Canary Deployments
Route a small percentage of traffic to a new API version while the majority hits the stable version. Monitor error rates and latency before ramping up.
# Kong canary plugin config
plugins:
- name: canary
service: checkout-service
config:
percentage: 5
upstream_host: checkout-v2.internal
upstream_port: 8080
upstream_uri: /v2
NGINX achieves canary routing with the split_clients directive (shown earlier). Envoy uses weighted clusters with the weight field.
# Envoy weighted cluster for canary
clusters:
- name: checkout_service
lb_policy: ROUND_ROBIN
weighted_clusters:
clusters:
- name: checkout_v1
weight: 95
- name: checkout_v2
weight: 5
API Versioning Strategies
URI Path Versioning
/api/v1/orders, /api/v2/orders — the most common approach. Simple to route but can lead to code duplication.
Header Versioning
Clients send Accept: application/vnd.api+json;version=2 or X-API-Version: 2. Keeps URLs clean but makes discovery harder.
Query Parameter Versioning
/api/orders?version=2 — easy to test from a browser but pollutes query parameters and clutters logs.
Content Negotiation
# NGINX routing by Accept header
map $http_accept $api_version {
"~application/vnd.payments.v1" "payment-v1";
"~application/vnd.payments.v2" "payment-v2";
default "payment-v1";
}
Gateway approach: route by version at the gateway, transform between versions as needed. The gateway maps external version headers to internal service endpoints, allowing backend teams to evolve APIs without breaking existing clients.
Comparison Table
| Tool | Performance | Plugins/Extensibility | Cost | Deployment Model | Best For |
|---|---|---|---|---|---|
| Kong | High (OpenResty) | 200+ plugins, PDK for custom | OSS free, Enterprise $ | Self-hosted or Konnect SaaS | Teams needing rich plugin ecosystem |
| Traefik | High (Go) | Middleware chain, custom middleware | OSS free, Enterprise $ | Self-hosted, native Docker/K8s | Kubernetes-native shops |
| Envoy | Very high (C++) | xDS filter chain, WASM extensions | Free (CNCF) | Sidecar or standalone | Service mesh, high throughput |
| AWS API Gateway | Medium (managed) | Lambda authorizers, VTL transforms | Pay-per-call $$ | Fully managed | Lambda-heavy AWS stacks |
| Azure APIM | Medium (managed) | Policy XML, developer portal | Consumption or dedicated $$$ | Fully managed | Enterprise Azure shops |
| NGINX | Very high (C) | Lua/JS modules, 3rd-party modules | OSS free, Plus $$ | Self-hosted | Existing NGINX infrastructure |
Decision Tree
Is your team size under 10?
├── Yes → Single gateway (start simple)
└── No → Do you have multiple independent product teams?
├── No → Gateway per domain
└── Yes → Does each team own distinct client types?
├── No → Gateway per team
└── Yes → BFF pattern (gateway per client type)
Are you already using Kubernetes?
├── Yes → Traefik or Envoy sidecar (Istio)
└── No → Do you need a managed solution?
├── Yes → AWS API Gateway / Azure APIM
└── No → Kong or NGINX (self-hosted)
Do you need per-service policy isolation?
├── Yes → Sidecar / service mesh pattern
└── No → Centralized or per-domain gateway
Conclusion
API gateway architecture is not a one-size-fits-all decision. A single gateway serves early-stage projects well but creates bottlenecks as teams grow. Gateway-per-team and gateway-per-domain patterns trade operational overhead for team autonomy. The BFF pattern optimizes for diverse client needs but multiplies maintenance surface. Sidecar gateways provide the finest granularity at the cost of infrastructure complexity.
Choose tools that match your operational maturity: managed gateways (AWS, Azure) for smaller teams with limited infrastructure experience, Kong or Traefik for teams wanting plugin ecosystems and Kubernetes integration, and Envoy for high-throughput or service-mesh use cases. Keep authentication, rate limiting, and circuit-breaking logic at the gateway so backend services stay focused on business logic.
Resources
- Kong Gateway Documentation
- Envoy Proxy Documentation
- NGINX Reverse Proxy Guide
- AWS API Gateway Developer Guide
- Traefik Routing & Middleware
- Istio Authorization Policy Reference
- Microservices Architecture
- Distributed Transactions
Comments