API Gateways: Routing, Authentication, and Rate Limiting

Introduction

An API gateway is a reverse proxy that sits between clients and backend services. It centralizes cross-cutting concerns — routing, authentication, rate limiting, logging, and request transformation — so individual services don’t have to reimplement them. In a microservices architecture, the gateway becomes the single entry point clients talk to, while backend services remain isolated and focused on business logic.

Without a gateway, every client must know the locations of every service, handle authentication independently, and implement retry/circuit-breaking logic. A gateway eliminates this duplication and gives operations teams a single control plane for traffic management.

Core Gateway Functions

Request Routing

A gateway routes incoming requests to the correct backend service based on attributes of the request — the URL path, HTTP method, headers, or even the request body. Modern gateways support several routing strategies:

Prefix routing: /api/users/* matches any subpath under /api/users
Exact path routing: /health only matches that exact path
Header-based routing: route based on X-API-Version: v2
Method-based routing: GET /api/orders vs POST /api/orders hit different backends
Query-parameter routing: route by ?region=eu-west

// Prefix-based router with middleware chain
package gateway

import (
	"fmt"
	"net/http"
	"net/http/httputil"
	"net/url"
	"strings"
)

type Route struct {
	Prefix  string
	Target  *url.URL
	Methods []string
}

type Gateway struct {
	routes []Route
}

func (g *Gateway) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	for _, route := range g.routes {
		if !strings.HasPrefix(r.URL.Path, route.Prefix) {
			continue
		}
		if !methodAllowed(r.Method, route.Methods) {
			http.Error(w, `{"error":"method not allowed"}`, http.StatusMethodNotAllowed)
			return
		}
		proxy := httputil.NewSingleHostReverseProxy(route.Target)
		proxy.ServeHTTP(w, r)
		return
	}
	http.Error(w, `{"error":"no matching route"}`, http.StatusNotFound)
}

func methodAllowed(method string, allowed []string) bool {
	if len(allowed) == 0 {
		return true
	}
	for _, m := range allowed {
		if strings.EqualFold(m, method) {
			return true
		}
	}
	return false
}

Load Balancing

Gateways distribute requests across healthy backend instances. Common algorithms include round-robin, least-connections, and IP-hash for session persistence. Health checks (active or passive) remove unhealthy targets from the pool.

# Traefik dynamic configuration with load balancing
http:
  routers:
    user-api:
      rule: "PathPrefix(`/api/users`)"
      service: user-service
      middlewares:
        - auth-jwt
        - rate-limit

  services:
    user-service:
      loadBalancer:
        servers:
          - url: "http://10.0.1.10:8080"
          - url: "http://10.0.1.11:8080"
          - url: "http://10.0.1.12:8080"
        healthCheck:
          path: "/health"
          interval: "10s"
          timeout: "3s"

Authentication Methods

JWT (JSON Web Tokens)

The gateway validates JWT tokens on every request before forwarding to the backend. This keeps auth logic out of individual services. The gateway extracts the token from the Authorization header, verifies the signature using a shared secret or public key, and optionally injects claims into downstream headers.

import jwt
import os
from datetime import datetime, timedelta

JWT_SECRET = os.environ.get("JWT_SECRET", "change-me-in-production")
JWT_ALGORITHM = "HS256"


def create_token(user_id: str, roles: list[str]) -> str:
    """Issue a signed JWT with expiry and custom claims."""
    payload = {
        "sub": user_id,
        "roles": roles,
        "iat": datetime.utcnow(),
        "exp": datetime.utcnow() + timedelta(hours=1),
    }
    return jwt.encode(payload, JWT_SECRET, algorithm=JWT_ALGORITHM)


def verify_gateway_token(token: str) -> dict | None:
    """Validate and decode a JWT. Returns claims or None on failure."""
    try:
        claims = jwt.decode(
            token,
            JWT_SECRET,
            algorithms=[JWT_ALGORITHM],
            options={"require": ["exp", "sub"]},
        )
        return claims
    except jwt.ExpiredSignatureError:
        return None
    except jwt.InvalidTokenError:
        return None

OAuth2 / OpenID Connect

For delegated authorization, the gateway acts as an OAuth2 client or proxy. It redirects unauthenticated users to the identity provider, handles the callback, and sets a session cookie or JWT. Tools like OAuth2 Proxy integrate directly with Kong and Traefik.

# Kong OAuth2 plugin configuration
plugins:
  - name: oauth2
    service: user-service
    config:
      scopes: ["profile", "email"]
      mandatory_scope: true
      token_expiration: 7200
      enable_authorization_code: true
      enable_client_credentials: true
      provision_key: "your-provision-key"

API Keys

API keys are the simplest auth mechanism — the gateway checks a static key in the header or query string against a whitelist. Use API keys for service-to-service communication or public-facing SDKs where JWT complexity is unwarranted.

func apiKeyMiddleware(allowedKeys map[string]string) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			key := r.Header.Get("X-API-Key")
			if key == "" {
				key = r.URL.Query().Get("api_key")
			}
			if key == "" {
				http.Error(w, `{"error":"missing api key"}`, http.StatusUnauthorized)
				return
			}
			owner, ok := allowedKeys[key]
			if !ok {
				http.Error(w, `{"error":"invalid api key"}`, http.StatusForbidden)
				return
			}
			r.Header.Set("X-API-Key-Owner", owner)
			next.ServeHTTP(w, r)
		})
	}
}

Kong JWT Route Registration

# Register a service and protect it with JWT authentication
$ curl -s -X POST http://localhost:8001/services \
  --data name=user-service \
  --data url=http://user-service:8080

$ curl -s -X POST http://localhost:8001/services/user-service/routes \
  --data name=user-route \
  --data paths[]=/api/users

$ curl -s -X POST http://localhost:8001/services/user-service/plugins \
  --data name=jwt

# Create a consumer and issue a JWT credential
$ curl -s -X POST http://localhost:8001/consumers \
  --data username=alice

$ curl -s -X POST http://localhost:8001/consumers/alice/jwt \
  --data algorithm=HS256 \
  --data secret=my-secret-key

# Test the protected endpoint
$ curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/api/users
# => 401

# Include a valid JWT
$ curl -s -H "Authorization: Bearer <jwt-token>" http://localhost:8000/api/users
# => 200

Rate Limiting Algorithms

Token Bucket

The token bucket algorithm maintains a bucket that fills at a constant rate (e.g., 10 tokens per second). Each request consumes one token. If the bucket is empty, the request is rejected. Bursts are allowed up to the bucket capacity.

import time
import threading


class TokenBucket:
    """Thread-safe token bucket rate limiter."""

    def __init__(self, capacity: int, fill_rate: float):
        self.capacity = capacity
        self.tokens = capacity
        self.fill_rate = fill_rate
        self.last_refill = time.monotonic()
        self.lock = threading.Lock()

    def _refill(self):
        now = time.monotonic()
        elapsed = now - self.last_refill
        new_tokens = elapsed * self.fill_rate
        self.tokens = min(self.capacity, self.tokens + new_tokens)
        self.last_refill = now

    def allow(self) -> bool:
        with self.lock:
            self._refill()
            if self.tokens >= 1:
                self.tokens -= 1
                return True
            return False


limiter = TokenBucket(capacity=10, fill_rate=1.0)

for _ in range(12):
    if limiter.allow():
        print("✅ request allowed")
    else:
        print("❌ rate limited")

Leaky Bucket

The leaky bucket treats requests like water poured into a bucket with a hole at the bottom. The bucket processes requests at a fixed rate regardless of incoming burst size. Excess requests overflow and are dropped. This enforces a strict processing rate with no bursting.

import asyncio
import time


class LeakyBucket:
    """Leaky bucket that processes requests at a fixed rate."""

    def __init__(self, rate: float, capacity: int):
        self.rate = rate  # requests per second
        self.capacity = capacity
        self.water = 0
        self.last_leak = time.monotonic()

    async def allow(self) -> bool:
        now = time.monotonic()
        self.water = max(0, self.water - (now - self.last_leak) * self.rate)
        self.last_leak = now

        if self.water < self.capacity:
            self.water += 1
            return True
        return False

Sliding Window Log

The sliding window algorithm maintains a log of timestamps for each request. On every request, it removes entries older than the window (e.g., 60 seconds) and counts the remaining entries. If the count exceeds the limit, the request is rejected. This is more accurate than fixed-window counters because it avoids boundary spikes.

from collections import defaultdict
import time


class SlidingWindowCounter:
    """Sliding window rate limiter per client IP."""

    def __init__(self, window_seconds: int = 60, max_requests: int = 100):
        self.window = window_seconds
        self.max_req = max_requests
        self.clients: dict[str, list[float]] = defaultdict(list)

    def allow(self, client_id: str) -> bool:
        now = time.monotonic()
        cutoff = now - self.window
        timestamps = self.clients[client_id]
        # Prune expired entries
        while timestamps and timestamps[0] < cutoff:
            timestamps.pop(0)
        if len(timestamps) >= self.max_req:
            return False
        timestamps.append(now)
        return True

Traefik Rate Limit Middleware

# Traefik rate limiting with token bucket semantics
http:
  middlewares:
    api-rate-limit:
      rateLimit:
        average: 100
        burst: 50
        period: 1m
        sourceCriterion:
          ipStrategy:
            depth: 1

  routers:
    api:
      rule: "PathPrefix(`/api`)"
      middlewares:
        - api-rate-limit
      service: backend

Request and Response Transformation

Gateways can modify requests before forwarding them to backends and responses before returning them to clients. Common transformations include:

Header injection: Add X-Request-Id, X-User-ID, or correlation headers
Path rewriting: /api/v2/users → /users
Response aggregation: Combine multiple upstream responses into a single response
Protocol translation: Accept HTTP/1.1 and forward to gRPC backends
Response compression: Add gzip/brotli encoding

# Kong request transformer plugin
plugins:
  - name: request-transformer
    service: user-service
    config:
      add:
        headers:
          - "X-Gateway-Version: 1.0"
          - "X-Forwarded-Proto: https"
      rename:
        headers:
          - "Authorization: X-Upstream-Auth"
      append:
        querystring:
          - "source: gateway"

  - name: response-transformer
    service: user-service
    config:
      add:
        headers:
          - "X-Response-Time: ${latency}"
      remove:
        headers:
          - "X-Internal-Trace"

# Test header injection via curl
$ curl -s -D- http://localhost:8000/api/users/me \
  -H "Authorization: Bearer <token>" | head -n 20

# Expected response headers include:
# X-Gateway-Version: 1.0
# X-Request-Id: abc-123-def
# Content-Type: application/json

Popular API Gateway Comparison

Feature	Kong	Traefik	AWS API Gateway	Envoy
Type	Proxy (NGINX-based)	Reverse proxy (Go)	Managed (AWS)	Sidecar/proxy (C++)
Deployment	Standalone / DB-less	Binary / K8s Ingress	Fully managed	Sidecar / Gateway
Routing	Path, header, method, regex	Path, host, header, query	Path, method, stage	Path, header, method, weight
Auth plugins	JWT, OAuth2, OIDC, LDAP, HMAC	JWT, OIDC, Basic, ForwardAuth	Cognito, Lambda, IAM, JWT	JWT, OAuth2, ext-authz
Rate limiting	Token bucket, sliding window	Token bucket	Token bucket, burst	Per-route, regional
Health checks	Active + passive	Active + passive	Route53 + CloudWatch	Active + passive + outlier
Service mesh	Via Kuma	Native mesh (K8s CRDs)	N/A	Istio, Consul, AWS Mesh
Performance	~5k req/s per core	~10k req/s per core	Scales automatically	~15k req/s per core
License	Apache 2.0 (OSS)	MIT	Proprietary	Apache 2.0
Best for	Enterprise API management	Cloud-native / K8s	Serverless / AWS stack	High-performance mesh

Security Considerations

A gateway is a security boundary. Misconfiguring it exposes your entire backend. Follow these practices:

TLS termination: Always terminate TLS at the gateway. Never forward plain HTTP to backends unless they’re on the same private subnet.
Request validation: Validate Content-Type, Content-Length, and reject malformed payloads before they reach services.

func validationMiddleware(maxBodyBytes int64) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			if r.ContentLength > maxBodyBytes {
				http.Error(w, `{"error":"request too large"}`, http.StatusRequestEntityTooLarge)
				return
			}
			ct := r.Header.Get("Content-Type")
			if r.Method == "POST" || r.Method == "PUT" || r.Method == "PATCH" {
				if ct != "application/json" {
					http.Error(w, `{"error":"unsupported media type"}`, http.StatusUnsupportedMediaType)
					return
				}
			}
			next.ServeHTTP(w, r)
		})
	}
}

IP allow/deny lists: Restrict access to admin endpoints by source IP range.
CORS: Configure strict origin, method, and header allowlists. Do not use Access-Control-Allow-Origin: * in production.
Rate limit by endpoint: Apply different limits per route — login endpoints get a stricter limit than read-only GET endpoints.

# Kong IP restriction plugin
$ curl -s -X POST http://localhost:8001/services/admin-api/plugins \
  --data name=ip-restriction \
  --data config.allow[]="10.0.0.0/8" \
  --data config.allow[]="172.16.0.0/12"

# Response
# => {"config":{"allow":["10.0.0.0/8","172.16.0.0/12"]},"enabled":true,...}

Performance Optimization

The gateway is in the critical path of every request. Optimize it ruthlessly:

Connection pooling: Reuse upstream connections instead of opening new ones per request. Kong and Envoy pool connections by default.
TLS session resumption: Enable TLS session tickets and session IDs to reduce handshake overhead.
Caching: Cache idempotent responses (GET /api/products) at the gateway level with a short TTL.
Timeouts: Set connect, read, and write timeouts to prevent slow upstreams from consuming gateway resources.

# Kong proxy performance tuning
env:
  KONG_PROXY_LISTEN: "0.0.0.0:8000"
  KONG_UPSTREAM_KEEPALIVE_POOL_SIZE: "256"
  KONG_UPSTREAM_KEEPALIVE_MAX_REQUESTS: "1000"
  KONG_UPSTREAM_KEEPALIVE_IDLE_TIMEOUT: "60"
  KONG_NGINX_PROXY_CONNECT_TIMEOUT: "5s"
  KONG_NGINX_PROXY_READ_TIMEOUT: "30s"
  KONG_NGINX_PROXY_SEND_TIMEOUT: "30s"

Deployment Patterns

Per-Team Gateway

Each team deploys their own gateway instance for their microservices. Teams own their routing and auth. This scales well but duplicates infrastructure.

Shared Gateway (Centralized)

A single gateway instance routes to all backend services across the organization. The operations team owns the gateway. This centralizes control but creates a single point of failure and a deployment bottleneck.

Sidecar Gateway (Service Mesh)

Each service instance runs a gateway sidecar (Envoy) that handles inter-service communication. An external gateway (also Envoy) handles ingress. This is the Istio/Linkerd pattern — full control with fine-grained per-service policies.

Kong Ingress Controller on Kubernetes

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-gateway
  annotations:
    kubernetes.io/ingress.class: kong
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /api/users
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 8080
      - path: /api/orders
        pathType: Prefix
        backend:
          service:
            name: order-service
            port:
              number: 8080

Conclusion

API gateways centralize cross-cutting concerns — routing, authentication, rate limiting, and transformation — reducing duplication across microservices. Choose Kong for enterprise API management with a rich plugin ecosystem, Traefik for cloud-native Kubernetes environments, AWS API Gateway for serverless architectures, and Envoy for high-performance service mesh deployments. Regardless of which tool you select, implement rate limiting, authentication, request validation, and TLS termination at the gateway layer to keep backend services focused on business logic.

API Gateways: Routing, Authentication, and Rate Limiting

Introduction

Core Gateway Functions

Request Routing

Load Balancing

Authentication Methods

JWT (JSON Web Tokens)

OAuth2 / OpenID Connect

API Keys

Kong JWT Route Registration

Rate Limiting Algorithms

Token Bucket

Leaky Bucket

Sliding Window Log

Traefik Rate Limit Middleware

Request and Response Transformation

Popular API Gateway Comparison

Security Considerations

Performance Optimization

Deployment Patterns

Per-Team Gateway

Shared Gateway (Centralized)

Sidecar Gateway (Service Mesh)

Kong Ingress Controller on Kubernetes

Conclusion

Resources

Comments

Share this article

👍 Was this article helpful?