Skip to main content

API Gateway Architecture: Implementation Patterns and Best Practices

Created: March 8, 2026 CalmOps 14 min read

Introduction

API gateways sit between clients and backend services, handling cross-cutting concerns so individual services stay focused on business logic. Choosing the right gateway architecture and tooling directly shapes your system’s scalability, security posture, and team autonomy. This article covers five deployment patterns, six gateway tools with working configuration, authentication and rate-limiting internals, and a decision framework for picking the right approach.

Gateway Architecture Patterns

Single Gateway (Monolithic Gateway)

One gateway instance handles all traffic for every client and service. This pattern works well for small teams and early-stage products where operational simplicity outweighs isolation.

Pros: Single configuration surface, unified authentication policy, one place to manage SSL and routing.

Cons: Team coordination bottlenecks, any service change risks breaking others, gateway becomes a single point of failure and a deployment scaling bottleneck.

Use this pattern during the first twelve months of a project or when the team size stays under ten engineers.

Gateway per Team

Each product team operates its own gateway instance. The gateway exposes only the routes and policies that team controls.

# team-billing-gateway.yaml - Kong declarative config for a per-team gateway
_format_version: "3.0"
services:
  - name: billing-service
    host: billing.internal
    port: 8080
    protocol: http
    routes:
      - name: billing-routes
        paths:
          - /api/v1/billing
        methods: [GET, POST, PUT]
        plugins:
          - name: key-auth
          - name: rate-limiting
            config:
              minute: 100
              policy: local
  - name: invoice-service
    host: invoice.internal
    port: 8080
    protocol: http
    routes:
      - name: invoice-routes
        paths:
          - /api/v1/invoices
        methods: [GET]
        plugins:
          - name: key-auth

Pros: Teams deploy independently, route changes stay scoped, blast radius shrinks to one team.

Cons: More gateways to operate, duplicated cross-cutting config (observability, CORS), inconsistent authentication across teams without shared libraries.

Apply this pattern when your organization has three or more independent product teams with separate deployment cadences.

Gateway per Domain

You deploy one gateway per bounded context or domain — payments, inventory, user management each get their own gateway aligned with the domain boundary.

This mirrors domain-driven design: the gateway enforces the domain boundary and encapsulates internal service topology. External clients interact with domain gateways, never with individual services.

# Envoy config for domain gateway (inventory domain)
static_resources:
  listeners:
    - name: inventory-listener
      address: { socket_address: { address: 0.0.0.0, port_value: 8081 } }
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: inventory_gw
                route_config:
                  name: inventory_routes
                  virtual_hosts:
                    - name: inventory
                      domains: ["inventory.api.example.com"]
                      routes:
                        - match: { prefix: "/stock" }
                          route: { cluster: stock_service }
                        - match: { prefix: "/warehouse" }
                          route: { cluster: warehouse_service }
                http_filters:
                  - name: envoy.filters.http.jwt_authn
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
                      providers:
                        auth0:
                          issuer: https://auth.example.com/
                          audiences: ["inventory-api"]
                          remote_jwks:
                            http_uri: { uri: https://auth.example.com/.well-known/jwks.json, cluster: auth_cluster }
                  - name: envoy.filters.http.router
  clusters:
    - name: stock_service
      connect_timeout: 0.25s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: stock_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address: { socket_address: { address: stock-svc, port_value: 8080 } }

Pros: Strong domain isolation, routing logic matches business boundaries, teams own their domain end-to-end.

Cons: Cross-domain workflows require calls between gateways, more infrastructure to maintain, potential duplication of shared logic.

BFF (Backend for Frontend)

Each client type — web, mobile, IoT, third-party — gets a dedicated gateway tailored to its specific needs. Mobile clients need payload minimization and offline tolerance; web clients need session management and server-side rendering support.

// BFF gateway handler for a mobile client — aggregates profile, orders, and notifications
// into a single mobile-optimized response
app.get("/mobile/dashboard", async (req, res) => {
  const [profile, orders, notifications] = await Promise.all([
    fetch("http://user-service/profile", { headers: { authorization: req.headers.authorization } }),
    fetch("http://order-service/recent", { headers: { authorization: req.headers.authorization } }),
    fetch("http://notify-service/unread", { headers: { authorization: req.headers.authorization } }),
  ]);

  const [profileData, ordersData, notificationsData] = await Promise.all([
    profile.json(), orders.json(), notifications.json(),
  ]);

  res.json({
    user: { name: profileData.name, avatar: profileData.avatar },
    recent_orders: ordersData.slice(0, 5),
    unread_count: notificationsData.count,
  });
});

Pros: Per-client optimization without leaking client concerns into general-purpose APIs, reduced mobile payload, independent scaling per client type.

Cons: BFF proliferation — each new client type adds a gateway, duplication of shared logic like authentication, maintenance burden grows linearly with client count.

Sidecar Gateway Pattern

Each service instance runs a gateway sidecar (typically Envoy or a lightweight proxy) that handles ingress traffic for that service. The sidecar enforces service-level policies — rate limits, authentication, TLS — without a centralized gateway.

This pattern naturally emerges in service mesh architectures like Istio, where each pod runs an Envoy sidecar as the data plane.

# Kubernetes sidecar injection annotation for Envoy
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  annotations:
    sidecar.istio.io/inject: "true"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
        - name: payment-app
          image: payment-service:1.4.2
          ports:
            - containerPort: 8080
---
# Istio AuthorizationPolicy enforced at the sidecar
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: payment-authz
spec:
  selector:
    matchLabels:
      app: payment-service
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/billing/sa/billing-sa"]
      to:
        - operation:
            methods: ["POST"]
            paths: ["/api/v1/charges"]

Pros: Fine-grained per-service policy, no single point of failure, integrates with mTLS and observability across the mesh.

Cons: Operational complexity of running a sidecar per pod, resource overhead (CPU/memory for each proxy), debugging across many proxies is harder than a centralized gateway.

Gateway Tool Deep Dives

Kong (Open Source / Enterprise)

Kong runs on OpenResty (NGINX + Lua) and provides a plugin ecosystem for authentication, rate limiting, logging, and transformation. It supports both DB-backed and DB-less (declarative) modes.

Use the declarative mode for GitOps workflows: commit your kong.yaml, apply it via deck gateway sync, and Kong reloads without downtime.

# kong-declarative.yaml — full gateway config for a multi-service setup
_format_version: "3.0"
_transform: true

services:
  - name: user-service
    url: http://user-svc:8080
    routes:
      - name: user-routes
        paths: ["/api/v1/users"]
        methods: [GET, POST, PATCH]
    plugins:
      - name: oauth2
        config:
          scopes: ["profile", "email"]
          provision_key: "your-provision-key"
          enable_authorization_code: true
          token_expiration: 7200
      - name: rate-limiting
        config:
          minute: 60
          policy: redis
          redis_host: redis-cluster
      - name: cors
        config:
          origins: ["https://app.example.com"]
          methods: [GET, POST, PATCH]
          headers: ["Authorization", "Content-Type"]

  - name: order-service
    url: http://order-svc:8080
    routes:
      - name: order-routes
        paths: ["/api/v1/orders"]
        methods: [GET, POST]
    plugins:
      - name: jwt
        config:
          claims_to_verify: ["exp", "nbf"]
          secret_is_base64: false
      - name: acl
        config:
          allow: ["admin", "support"]

Traefik

Traefik auto-discovers services from Docker, Kubernetes, Consul, and other providers. It supports automatic HTTPS via Let’s Encrypt and provides a middleware-based pipeline for rate limiting, circuit breaking, and retries.

# Traefik dynamic config with middleware pipeline
http:
  routers:
    api-router:
      rule: "Host(`api.example.com`) && PathPrefix(`/v1`)"
      entryPoints:
        - websecure
      middlewares:
        - rate-limit@file
        - circuit-breaker@file
        - jwt-auth@file
      service: backend-service
      tls:
        certResolver: letsencrypt

  middlewares:
    rate-limit:
      rateLimit:
        average: 100
        burst: 50
        period: 1m
        sourceCriterion:
          ipStrategy:
            depth: 1

    circuit-breaker:
      circuitBreaker:
        expression: "LatencyAtQuantileMS(50.0) > 100 || ResponseCodeRatio(500, 600, 0, 600) > 0.05"
        checkPeriod: 10s
        fallbackDuration: 30s
        recoveryDuration: 60s

    jwt-auth:
      forwardAuth:
        address: "http://auth-service:8080/validate"
        trustForwardHeader: true
        authResponseHeaders:
          - X-Auth-User
          - X-Auth-Roles

Envoy Proxy

Envoy provides high-performance L3/L4/L7 proxying with advanced load balancing, circuit breaking, and observability. It uses a discovery-driven xDS API for dynamic configuration, making it the data plane of choice for service meshes like Istio.

Key strengths: highly performant C++ core, rich load-balancing strategies (maglev, ring hash, consistent hashing), HTTP/2 and gRPC native support, distributed tracing with OpenTelemetry.

# Envoy cluster config with advanced circuit breaking
static_resources:
  clusters:
    - name: payment_cluster
      connect_timeout: 0.25s
      type: STRICT_DNS
      lb_policy: LEAST_REQUEST
      circuit_breakers:
        thresholds:
          - priority: DEFAULT
            max_connections: 1000
            max_pending_requests: 500
            max_requests: 2000
            max_retries: 5
      outlier_detection:
        consecutive_5xx: 5
        interval: 30s
        base_ejection_time: 30s
        max_ejection_percent: 50

AWS API Gateway

A fully managed gateway that integrates with Lambda, ALB, and private VPC endpoints. It supports REST, HTTP, and WebSocket APIs. The pricing model charges per million API calls plus data transfer, making it cost-effective at low to moderate traffic levels.

# SAM template for API Gateway with usage plans and Lambda authorizer
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31

Resources:
  PaymentApi:
    Type: AWS::Serverless::Api
    Properties:
      StageName: prod
      Auth:
        DefaultAuthorizer: JwtAuthorizer
        Authorizers:
          JwtAuthorizer:
            FunctionArn: !GetAtt JwtValidationFunction.Arn
            Identity:
              Headers:
                - Authorization
      MethodSettings:
        - ResourcePath: "/*"
          HttpMethod: "*"
          ThrottlingRateLimit: 1000
          ThrottlingBurstLimit: 500
          DataTraceEnabled: true
          LoggingLevel: INFO

  ChargeFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: charges.handler
      Events:
        ChargeAPI:
          Type: Api
          Properties:
            RestApiId: !Ref PaymentApi
            Path: /charges
            Method: POST

  UsagePlan:
    Type: AWS::ApiGateway::UsagePlan
    Properties:
      ApiStages:
        - ApiId: !Ref PaymentApi
          Stage: prod
      Throttle:
        RateLimit: 1000
        BurstLimit: 500
      Quota:
        Limit: 100000
        Period: MONTH

Azure API Management

Provides a managed gateway with a rich policy engine expressed in XML. Supports multi-region deployment, developer portal, and API product management.

<!-- Azure API Management policy — JWT validation + rate limiting -->
<policies>
  <inbound>
    <base />
    <validate-jwt
      header-name="Authorization"
      failed-validation-httpcode="401"
      failed-validation-error-message="Unauthorized">
      <openid-config url="https://login.microsoftonline.com/tenant/v2.0/.well-known/openid-configuration" />
      <required-claims>
        <claim name="aud" match="all">
          <value>api://my-api</value>
        </claim>
      </required-claims>
    </validate-jwt>
    <rate-limit calls="200" renewal-period="60" />
    <set-header name="X-Forwarded-By" exists-action="override">
      <value>APIM</value>
    </set-header>
  </inbound>
  <backend>
    <base />
  </backend>
  <outbound>
    <base />
    <set-header name="X-RateLimit-Remaining" exists-action="override">
      <value>@(context.Deployment.RateLimit.Remaining)</value>
    </set-header>
  </outbound>
</policies>

NGINX / NGINX Plus

NGINX acts as a high-performance reverse proxy with a mature configuration model. NGINX Plus adds native health checks, active session draining, and status monitoring.

# NGINX gateway config — routing, rate limiting, and canary deployment
upstream payment-api-v1 {
    server payment-v1-1:8080 weight=100;
    server payment-v1-2:8080 weight=100;
}

upstream payment-api-v2 {
    server payment-v2-1:8080 weight=10;
    server payment-v2-2:8080 weight=10;
}

limit_req_zone $binary_remote_addr zone=api_limit:10m rate=50r/s;
limit_conn_zone $binary_remote_addr zone=perip:10m;

split_clients "app_version_${remote_addr}" $canary_upstream {
    10%    payment-api-v2;
    *      payment-api-v1;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    ssl_certificate     /etc/ssl/certs/example.crt;
    ssl_certificate_key /etc/ssl/private/example.key;

    location /api/v1/payments {
        limit_req zone=api_limit burst=100 nodelay;
        limit_conn perip 20;

        proxy_pass http://$canary_upstream;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Request-ID $request_id;

        proxy_next_upstream error timeout http_500 http_502;
        proxy_next_upstream_tries 3;
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
    }

    location /healthz {
        return 200 "OK";
        add_header Content-Type text/plain;
    }
}

Authentication Flows

JWT Validation at the Gateway

The gateway validates JWT tokens before forwarding requests to backend services. It checks the signature using the issuer’s JWKS endpoint, verifies expiration (exp) and not-before (nbf), and optionally enforces custom claims.

Envoy handles this natively via the JWT authentication filter. Kong uses the jwt or oauth2 plugin. Offloading JWT validation to the gateway removes token-handling logic from every service.

# Kong JWT validation plugin config
plugins:
  - name: jwt
    service: order-service
    config:
      uri_param_names: [jwt]
      key_claim_name: kid
      secret_is_base64: false
      claims_to_verify: [exp, nbf]
      anonymous: null
      run_on_preflight: true

OAuth2 Token Exchange

For cross-service delegation, the gateway can exchange a client’s access token for a scoped-down token using the OAuth2 token exchange flow. The backend receives only the permissions it needs (least privilege).

# Token exchange request at the gateway
$ curl -X POST https://gateway.example.com/auth/token-exchange \
  -H "Authorization: Bearer $USER_TOKEN" \
  -d "grant_type=urn:ietf:params:oauth:grant-type:token-exchange" \
  -d "audience=payment-service" \
  -d "requested_token_type=urn:ietf:params:oauth:token-type:access_token" \
  -d "scope=payment:read payment:write"

mTLS Between Gateways and Services

Mutual TLS ensures that both the gateway and the backend service authenticate each other. The gateway presents a client certificate, and the service validates it. This prevents unauthorized clients from reaching internal services even if they bypass the gateway.

# Gateway mTLS configuration with NGINX
$ openssl req -x509 -newkey rsa:4096 -keyout gateway-key.pem \
  -out gateway-cert.pem -days 365 -nodes \
  -subj "/CN=gateway.example.com"

# Configure NGINX to use client cert when connecting upstream
# proxy_ssl_certificate /etc/ssl/certs/gateway-cert.pem;
# proxy_ssl_certificate_key /etc/ssl/private/gateway-key.pem;
# proxy_ssl_verify on;
# proxy_ssl_trusted_certificate /etc/ssl/certs/ca-cert.pem;

Rate Limiting Algorithms

Token Bucket

Each client receives a bucket that fills at a fixed rate (the refill rate) up to a maximum capacity (the burst). Each request consumes one token. If the bucket is empty, the request is rejected.

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_refill = time.monotonic()

    def allow_request(self) -> bool:
        now = time.monotonic()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Kong’s rate-limiting plugin supports policy: local for token bucket in-memory and policy: redis for distributed token buckets.

Sliding Window

Instead of a fixed window (which can allow bursts at window boundaries), the sliding window algorithm uses a window of the last N seconds, weighting older requests proportionally.

from collections import deque

class SlidingWindow:
    def __init__(self, window_size: float = 60.0, max_requests: int = 100):
        self.window_size = window_size
        self.max_requests = max_requests
        self.timestamps: deque[float] = deque()

    def allow_request(self) -> bool:
        now = time.monotonic()
        cutoff = now - self.window_size
        while self.timestamps and self.timestamps[0] < cutoff:
            self.timestamps.popleft()
        if len(self.timestamps) < self.max_requests:
            self.timestamps.append(now)
            return True
        return False

Distributed Rate Limiting

When you run multiple gateway instances, in-memory rate limiting is insufficient — a client can exhaust the limit across instances. Distributed rate limiting uses Redis or a similar store to maintain a shared counter.

import redis.asyncio as aioredis

class DistributedRateLimiter:
    def __init__(self, redis_url: str = "redis://redis-cluster:6379"):
        self.redis = aioredis.from_url(redis_url)

    async def allow(self, key: str, max_requests: int = 100, window: int = 60) -> bool:
        now = int(time.monotonic())
        window_key = f"ratelimit:{key}:{now // window}"
        count = await self.redis.incr(window_key)
        if count == 1:
            await self.redis.expire(window_key, window + 1)
        return count <= max_requests

Request and Response Transformation

Gateways rewrite requests and responses to decouple client contracts from backend implementations. Common transformations:

Transformation Purpose Example Tool
Header injection Add tracing headers, API version Kong request-transformer, NGINX proxy_set_header
Path rewriting Map external routes to internal services Traefik stripPrefixRegex, NGINX rewrite
JSON-to-XML conversion Support legacy XML clients Kong response-transformer
Payload stripping Remove internal fields from responses Kong response-transformer with remove.json
Protocol bridging HTTP to gRPC transcoding Envoy grpc-json-transcoder
# Kong request-transformer plugin — rewrite and inject headers
plugins:
  - name: request-transformer
    config:
      add:
        headers:
          - X-Request-ID:{uuid}
          - X-API-Version:2026-04
      remove:
        headers:
          - X-Internal-Token
      append:
        querystring:
          - source:gateway

Circuit Breaking

Circuit breakers prevent cascading failures by stopping requests to a degraded backend. The breaker transitions through three states: closed (normal operation), open (requests fail fast), and half-open (probing recovery).

Envoy circuit breaker thresholds:

  • max_connections: maximum simultaneous connections to each upstream host
  • max_pending_requests: maximum queued requests when connections are saturated
  • max_requests: maximum parallel requests across all connections
  • max_retries: maximum retries allowed per request
# Kong proxy-cache + circuit breaker custom plugin config
# Circuit breaker behavior via upstream health checks
upstreams:
  - name: payment-upstream
    algorithm: consistent-hashing
    healthchecks:
      active:
        type: http
        http_path: /health
        healthy:
          interval: 5
          successes: 3
        unhealthy:
          interval: 5
          http_failures: 5
          tcp_failures: 3
          timeouts: 3

Canary Deployments

Route a small percentage of traffic to a new API version while the majority hits the stable version. Monitor error rates and latency before ramping up.

# Kong canary plugin config
plugins:
  - name: canary
    service: checkout-service
    config:
      percentage: 5
      upstream_host: checkout-v2.internal
      upstream_port: 8080
      upstream_uri: /v2

NGINX achieves canary routing with the split_clients directive (shown earlier). Envoy uses weighted clusters with the weight field.

# Envoy weighted cluster for canary
clusters:
  - name: checkout_service
    lb_policy: ROUND_ROBIN
    weighted_clusters:
      clusters:
        - name: checkout_v1
          weight: 95
        - name: checkout_v2
          weight: 5

API Versioning Strategies

URI Path Versioning

/api/v1/orders, /api/v2/orders — the most common approach. Simple to route but can lead to code duplication.

Header Versioning

Clients send Accept: application/vnd.api+json;version=2 or X-API-Version: 2. Keeps URLs clean but makes discovery harder.

Query Parameter Versioning

/api/orders?version=2 — easy to test from a browser but pollutes query parameters and clutters logs.

Content Negotiation

# NGINX routing by Accept header
map $http_accept $api_version {
    "~application/vnd.payments.v1"  "payment-v1";
    "~application/vnd.payments.v2"  "payment-v2";
    default                          "payment-v1";
}

Gateway approach: route by version at the gateway, transform between versions as needed. The gateway maps external version headers to internal service endpoints, allowing backend teams to evolve APIs without breaking existing clients.

Comparison Table

Tool Performance Plugins/Extensibility Cost Deployment Model Best For
Kong High (OpenResty) 200+ plugins, PDK for custom OSS free, Enterprise $ Self-hosted or Konnect SaaS Teams needing rich plugin ecosystem
Traefik High (Go) Middleware chain, custom middleware OSS free, Enterprise $ Self-hosted, native Docker/K8s Kubernetes-native shops
Envoy Very high (C++) xDS filter chain, WASM extensions Free (CNCF) Sidecar or standalone Service mesh, high throughput
AWS API Gateway Medium (managed) Lambda authorizers, VTL transforms Pay-per-call $$ Fully managed Lambda-heavy AWS stacks
Azure APIM Medium (managed) Policy XML, developer portal Consumption or dedicated $$$ Fully managed Enterprise Azure shops
NGINX Very high (C) Lua/JS modules, 3rd-party modules OSS free, Plus $$ Self-hosted Existing NGINX infrastructure

Decision Tree

Is your team size under 10?
  ├── Yes → Single gateway (start simple)
  └── No → Do you have multiple independent product teams?
       ├── No → Gateway per domain
       └── Yes → Does each team own distinct client types?
            ├── No → Gateway per team
            └── Yes → BFF pattern (gateway per client type)

                      Are you already using Kubernetes?
                        ├── Yes → Traefik or Envoy sidecar (Istio)
                        └── No → Do you need a managed solution?
                             ├── Yes → AWS API Gateway / Azure APIM
                             └── No → Kong or NGINX (self-hosted)

                      Do you need per-service policy isolation?
                        ├── Yes → Sidecar / service mesh pattern
                        └── No → Centralized or per-domain gateway

Conclusion

API gateway architecture is not a one-size-fits-all decision. A single gateway serves early-stage projects well but creates bottlenecks as teams grow. Gateway-per-team and gateway-per-domain patterns trade operational overhead for team autonomy. The BFF pattern optimizes for diverse client needs but multiplies maintenance surface. Sidecar gateways provide the finest granularity at the cost of infrastructure complexity.

Choose tools that match your operational maturity: managed gateways (AWS, Azure) for smaller teams with limited infrastructure experience, Kong or Traefik for teams wanting plugin ecosystems and Kubernetes integration, and Envoy for high-throughput or service-mesh use cases. Keep authentication, rate limiting, and circuit-breaking logic at the gateway so backend services stay focused on business logic.

Resources

Comments

Share this article

Scan to read on mobile