Skip to main content

Circuit Breaker Pattern: Building Resilient Microservices

Created: May 14, 2026 Larry Qu 10 min read

Introduction

In distributed systems, a single failing service can trigger a cascade of failures across your entire infrastructure. When Service A calls Service B, and Service B is slow or unresponsive, Service A’s threads get blocked waiting for responses. As requests pile up, Service A exhausts its thread pool and becomes unresponsive itself. Now Service C, which depends on Service A, starts failing too. Within minutes, your entire system is down.

The circuit breaker pattern prevents this cascade by monitoring calls to external services and “opening the circuit” when failures exceed a threshold. Once open, the circuit breaker immediately rejects requests without attempting the call, giving the failing service time to recover while keeping your system responsive.

This pattern is named after electrical circuit breakers in your home — when too much current flows through, the breaker trips and stops the flow to prevent damage.

How Circuit Breakers Work

A circuit breaker wraps calls to external services and tracks their success and failure rates. It operates in three states:

stateDiagram-v2
    [*] --> Closed
    Closed --> Open: Failure threshold exceeded
    Open --> HalfOpen: Timeout expires
    HalfOpen --> Closed: Test calls succeed
    HalfOpen --> Open: Test calls fail
    Closed --> Closed: Calls succeed

State Transitions

Closed State (Normal Operation)

  • All requests pass through to the downstream service
  • Circuit breaker tracks success and failure rates
  • If failure rate exceeds threshold, transition to Open

Open State (Failing Fast)

  • All requests fail immediately without calling the service
  • Returns a predefined error or fallback response
  • After a timeout period, transition to Half-Open

Half-Open State (Testing Recovery)

  • Allow a limited number of test requests through
  • If test requests succeed, transition back to Closed
  • If test requests fail, return to Open state

Core Concepts

Failure Threshold

The circuit breaker monitors a sliding window of recent requests and calculates the failure rate. When failures exceed the configured threshold, the circuit opens.

// Resilience4j configuration
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)                    // Open at 50% failure rate
    .slidingWindowSize(10)                       // Track last 10 calls
    .minimumNumberOfCalls(5)                     // Need 5 calls before calculating rate
    .waitDurationInOpenState(Duration.ofSeconds(30))  // Stay open for 30s
    .permittedNumberOfCallsInHalfOpenState(3)    // Allow 3 test calls
    .build();

Sliding Window

Circuit breakers use either count-based or time-based sliding windows to aggregate call outcomes:

Count-Based Window: Tracks the last N calls

  • Simple and predictable
  • Good for high-traffic services
  • Example: Last 100 requests

Time-Based Window: Tracks calls within a time period

  • Better for variable traffic patterns
  • Adapts to traffic spikes
  • Example: Last 60 seconds

Timeout and Recovery

After opening, the circuit breaker waits for a configured duration before transitioning to Half-Open. This gives the failing service time to recover without being overwhelmed by requests.

// Polly (.NET) configuration
var circuitBreakerPolicy = Policy
    .Handle<HttpRequestException>()
    .CircuitBreakerAsync(
        exceptionsAllowedBeforeBreaking: 5,
        durationOfBreak: TimeSpan.FromSeconds(30),
        onBreak: (exception, duration) => {
            Console.WriteLine($"Circuit opened for {duration.TotalSeconds}s");
        },
        onReset: () => {
            Console.WriteLine("Circuit closed, service recovered");
        }
    );

Implementation Examples

Java with Resilience4j

Resilience4j is the modern replacement for Netflix Hystrix (now in maintenance mode). It’s lightweight, functional, and designed for Java 8+.

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;

import java.time.Duration;
import java.util.function.Supplier;

public class PaymentService {
    private final CircuitBreaker circuitBreaker;
    private final ExternalPaymentAPI paymentAPI;

    public PaymentService(ExternalPaymentAPI paymentAPI) {
        this.paymentAPI = paymentAPI;
        
        CircuitBreakerConfig config = CircuitBreakerConfig.custom()
            .failureRateThreshold(50)
            .slowCallRateThreshold(50)
            .slowCallDurationThreshold(Duration.ofSeconds(2))
            .slidingWindowType(CircuitBreakerConfig.SlidingWindowType.COUNT_BASED)
            .slidingWindowSize(10)
            .minimumNumberOfCalls(5)
            .waitDurationInOpenState(Duration.ofSeconds(30))
            .permittedNumberOfCallsInHalfOpenState(3)
            .automaticTransitionFromOpenToHalfOpenEnabled(true)
            .recordExceptions(IOException.class, TimeoutException.class)
            .ignoreExceptions(IllegalArgumentException.class)
            .build();

        CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config);
        this.circuitBreaker = registry.circuitBreaker("paymentService");
        
        // Register event listeners
        circuitBreaker.getEventPublisher()
            .onStateTransition(event -> 
                System.out.println("Circuit breaker state: " + event.getStateTransition())
            )
            .onError(event -> 
                System.out.println("Call failed: " + event.getThrowable().getMessage())
            );
    }

    public PaymentResult processPayment(PaymentRequest request) {
        Supplier<PaymentResult> decoratedSupplier = CircuitBreaker
            .decorateSupplier(circuitBreaker, () -> paymentAPI.charge(request));
        
        try {
            return decoratedSupplier.get();
        } catch (CallNotPermittedException e) {
            // Circuit is open, return fallback
            return PaymentResult.unavailable("Payment service temporarily unavailable");
        } catch (Exception e) {
            // Other errors
            return PaymentResult.error("Payment failed: " + e.getMessage());
        }
    }
}

.NET with Polly

Polly is the standard resilience library for .NET applications.

using Polly;
using Polly.CircuitBreaker;
using System;
using System.Net.Http;
using System.Threading.Tasks;

public class OrderService
{
    private readonly HttpClient _httpClient;
    private readonly AsyncCircuitBreakerPolicy<HttpResponseMessage> _circuitBreakerPolicy;

    public OrderService(HttpClient httpClient)
    {
        _httpClient = httpClient;
        
        _circuitBreakerPolicy = Policy
            .HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
            .Or<HttpRequestException>()
            .Or<TaskCanceledException>()
            .AdvancedCircuitBreakerAsync(
                failureThreshold: 0.5,           // Open at 50% failure rate
                samplingDuration: TimeSpan.FromSeconds(10),
                minimumThroughput: 5,            // Need 5 calls in window
                durationOfBreak: TimeSpan.FromSeconds(30),
                onBreak: (result, duration) =>
                {
                    Console.WriteLine($"Circuit opened for {duration.TotalSeconds}s");
                },
                onReset: () =>
                {
                    Console.WriteLine("Circuit closed");
                },
                onHalfOpen: () =>
                {
                    Console.WriteLine("Circuit half-open, testing...");
                }
            );
    }

    public async Task<Order> GetOrderAsync(string orderId)
    {
        try
        {
            var response = await _circuitBreakerPolicy.ExecuteAsync(async () =>
                await _httpClient.GetAsync($"/api/orders/{orderId}")
            );

            if (response.IsSuccessStatusCode)
            {
                return await response.Content.ReadAsAsync<Order>();
            }

            return Order.NotFound(orderId);
        }
        catch (BrokenCircuitException)
        {
            // Circuit is open, return cached data or default
            return await GetCachedOrderAsync(orderId) 
                ?? Order.Unavailable(orderId);
        }
    }

    private async Task<Order> GetCachedOrderAsync(string orderId)
    {
        // Return cached data if available
        return null;
    }
}

Go Implementation

Go doesn’t have a standard circuit breaker library, but implementing one is straightforward.

package circuitbreaker

import (
    "errors"
    "sync"
    "time"
)

type State int

const (
    StateClosed State = iota
    StateOpen
    StateHalfOpen
)

var (
    ErrCircuitOpen = errors.New("circuit breaker is open")
)

type CircuitBreaker struct {
    maxFailures  int
    timeout      time.Duration
    halfOpenMax  int
    
    mu           sync.RWMutex
    state        State
    failures     int
    lastFailTime time.Time
    halfOpenCalls int
}

func New(maxFailures int, timeout time.Duration, halfOpenMax int) *CircuitBreaker {
    return &CircuitBreaker{
        maxFailures: maxFailures,
        timeout:     timeout,
        halfOpenMax: halfOpenMax,
        state:       StateClosed,
    }
}

func (cb *CircuitBreaker) Call(fn func() error) error {
    cb.mu.Lock()
    
    // Check if we should transition from Open to Half-Open
    if cb.state == StateOpen {
        if time.Since(cb.lastFailTime) > cb.timeout {
            cb.state = StateHalfOpen
            cb.halfOpenCalls = 0
        } else {
            cb.mu.Unlock()
            return ErrCircuitOpen
        }
    }
    
    // Reject if Half-Open and already testing
    if cb.state == StateHalfOpen && cb.halfOpenCalls >= cb.halfOpenMax {
        cb.mu.Unlock()
        return ErrCircuitOpen
    }
    
    if cb.state == StateHalfOpen {
        cb.halfOpenCalls++
    }
    
    cb.mu.Unlock()
    
    // Execute the function
    err := fn()
    
    cb.mu.Lock()
    defer cb.mu.Unlock()
    
    if err != nil {
        cb.onFailure()
        return err
    }
    
    cb.onSuccess()
    return nil
}

func (cb *CircuitBreaker) onSuccess() {
    if cb.state == StateHalfOpen {
        // Successful test call, close the circuit
        cb.state = StateClosed
        cb.failures = 0
        cb.halfOpenCalls = 0
    } else if cb.state == StateClosed {
        // Reset failure count on success
        cb.failures = 0
    }
}

func (cb *CircuitBreaker) onFailure() {
    cb.failures++
    cb.lastFailTime = time.Now()
    
    if cb.state == StateHalfOpen {
        // Test failed, reopen circuit
        cb.state = StateOpen
        cb.halfOpenCalls = 0
    } else if cb.failures >= cb.maxFailures {
        // Too many failures, open circuit
        cb.state = StateOpen
    }
}

func (cb *CircuitBreaker) State() State {
    cb.mu.RLock()
    defer cb.mu.RUnlock()
    return cb.state
}

// Usage example
func main() {
    cb := New(5, 30*time.Second, 3)
    
    err := cb.Call(func() error {
        // Call external service
        resp, err := http.Get("https://api.example.com/data")
        if err != nil {
            return err
        }
        defer resp.Body.Close()
        
        if resp.StatusCode != 200 {
            return errors.New("service returned error")
        }
        
        return nil
    })
    
    if err == ErrCircuitOpen {
        // Circuit is open, use fallback
        fmt.Println("Service unavailable, using cached data")
    } else if err != nil {
        // Other error
        fmt.Printf("Request failed: %v\n", err)
    }
}

Best Practices

1. Configure Thresholds Based on Traffic Patterns

Don’t use the same configuration for all services. High-traffic services need larger sliding windows, while low-traffic services need smaller ones.

// High-traffic service (1000+ req/min)
CircuitBreakerConfig highTraffic = CircuitBreakerConfig.custom()
    .slidingWindowSize(100)
    .minimumNumberOfCalls(20)
    .failureRateThreshold(50)
    .build();

// Low-traffic service (10-50 req/min)
CircuitBreakerConfig lowTraffic = CircuitBreakerConfig.custom()
    .slidingWindowSize(10)
    .minimumNumberOfCalls(5)
    .failureRateThreshold(60)
    .build();

2. Implement Fallback Strategies

When the circuit is open, don’t just return errors. Provide fallback responses:

  • Return cached data
  • Return default values
  • Degrade functionality gracefully
  • Queue requests for later processing
public UserProfile getUserProfile(String userId) {
    try {
        return circuitBreaker.executeSupplier(() -> 
            userService.fetchProfile(userId)
        );
    } catch (CallNotPermittedException e) {
        // Circuit open, try cache
        UserProfile cached = cache.get(userId);
        if (cached != null) {
            return cached.withStaleWarning();
        }
        // Return minimal profile
        return UserProfile.minimal(userId);
    }
}

3. Monitor Circuit Breaker State

Export circuit breaker metrics to your monitoring system. Track:

  • State transitions (Closed → Open → Half-Open)
  • Failure rates
  • Call duration
  • Number of rejected calls
// Resilience4j with Micrometer
CircuitBreaker circuitBreaker = registry.circuitBreaker("paymentService");
MeterRegistry meterRegistry = new SimpleMeterRegistry();

TaggedCircuitBreakerMetrics.ofCircuitBreakerRegistry(registry)
    .bindTo(meterRegistry);

// Metrics available:
// - resilience4j.circuitbreaker.state (0=closed, 1=open, 2=half_open)
// - resilience4j.circuitbreaker.calls (success, failure, not_permitted)
// - resilience4j.circuitbreaker.failure.rate

4. Use Different Timeouts for Different Failure Types

Not all failures are equal. Network timeouts might need longer recovery periods than HTTP 500 errors.

var policy = Policy
    .Handle<TimeoutException>()
    .CircuitBreakerAsync(3, TimeSpan.FromMinutes(2))  // Long timeout for network issues
    .WrapAsync(
        Policy
            .Handle<HttpRequestException>()
            .CircuitBreakerAsync(5, TimeSpan.FromSeconds(30))  // Short timeout for HTTP errors
    );

5. Combine with Retry and Timeout Policies

Circuit breakers work best when combined with other resilience patterns:

// Resilience4j combining multiple patterns
Retry retry = Retry.of("paymentService", RetryConfig.custom()
    .maxAttempts(3)
    .waitDuration(Duration.ofMillis(500))
    .build());

TimeLimiter timeLimiter = TimeLimiter.of(TimeLimiterConfig.custom()
    .timeoutDuration(Duration.ofSeconds(2))
    .build());

Supplier<PaymentResult> decoratedSupplier = Decorators
    .ofSupplier(() -> paymentAPI.charge(request))
    .withCircuitBreaker(circuitBreaker)
    .withRetry(retry)
    .withTimeLimiter(timeLimiter)
    .decorate();

Common Pitfalls

1. Setting Thresholds Too Low

Opening the circuit too aggressively can cause unnecessary service degradation. A few failures in a high-traffic system are normal.

Bad: Open after 2 failures

.failureRateThreshold(20)  // Opens at 20% failure rate
.minimumNumberOfCalls(2)   // With only 2 calls

Good: Require statistical significance

.failureRateThreshold(50)  // Opens at 50% failure rate
.minimumNumberOfCalls(10)  // Need at least 10 calls

2. Not Handling Circuit Open State

Failing to provide fallbacks when the circuit is open defeats the purpose of the pattern.

Bad: Propagate the error

public Data getData() {
    return circuitBreaker.executeSupplier(() -> api.fetch());
    // Throws CallNotPermittedException when open
}

Good: Provide fallback

public Data getData() {
    try {
        return circuitBreaker.executeSupplier(() -> api.fetch());
    } catch (CallNotPermittedException e) {
        return cache.getOrDefault(Data.empty());
    }
}

3. Sharing Circuit Breakers Across Different Endpoints

Each external dependency should have its own circuit breaker. Sharing one breaker across multiple endpoints means one failing endpoint can block all others.

Bad: One breaker for entire service

CircuitBreaker breaker = registry.circuitBreaker("externalService");
breaker.executeSupplier(() -> api.getUsers());
breaker.executeSupplier(() -> api.getOrders());  // Blocked if getUsers fails

Good: Separate breakers per endpoint

CircuitBreaker userBreaker = registry.circuitBreaker("externalService-users");
CircuitBreaker orderBreaker = registry.circuitBreaker("externalService-orders");

userBreaker.executeSupplier(() -> api.getUsers());
orderBreaker.executeSupplier(() -> api.getOrders());  // Independent

4. Ignoring Slow Calls

Circuit breakers should track not just failures, but also slow calls that tie up resources.

CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)
    .slowCallRateThreshold(50)              // Also track slow calls
    .slowCallDurationThreshold(Duration.ofSeconds(3))  // >3s is slow
    .build();

Service Mesh Integration

Modern service meshes like Istio and Linkerd implement circuit breakers at the infrastructure level, removing the need for application-level libraries.

Istio Circuit Breaker

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: payment-service
spec:
  host: payment-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 100
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
      minHealthPercent: 50

Advantages:

  • No code changes required
  • Consistent behavior across all services
  • Centralized configuration
  • Language-agnostic

Disadvantages:

  • Less fine-grained control
  • Harder to test locally
  • Requires service mesh infrastructure

Testing Circuit Breakers

Unit Testing

@Test
public void testCircuitOpensAfterFailures() {
    CircuitBreakerConfig config = CircuitBreakerConfig.custom()
        .failureRateThreshold(50)
        .slidingWindowSize(4)
        .minimumNumberOfCalls(4)
        .build();
    
    CircuitBreaker breaker = CircuitBreaker.of("test", config);
    
    // Simulate 3 failures
    for (int i = 0; i < 3; i++) {
        try {
            breaker.executeSupplier(() -> {
                throw new RuntimeException("Service down");
            });
        } catch (Exception e) {
            // Expected
        }
    }
    
    // Circuit should still be closed (need 4 calls minimum)
    assertEquals(CircuitBreaker.State.CLOSED, breaker.getState());
    
    // One more failure should open it
    try {
        breaker.executeSupplier(() -> {
            throw new RuntimeException("Service down");
        });
    } catch (Exception e) {
        // Expected
    }
    
    // Now circuit should be open
    assertEquals(CircuitBreaker.State.OPEN, breaker.getState());
    
    // Next call should fail fast
    assertThrows(CallNotPermittedException.class, () -> {
        breaker.executeSupplier(() -> "success");
    });
}

Integration Testing with Chaos Engineering

Use tools like Chaos Monkey or Toxiproxy to simulate service failures and verify circuit breaker behavior.

# Toxiproxy: Add latency to payment service
toxiproxy-cli toxic add payment-service -t latency -a latency=5000

# Verify circuit breaker opens after threshold
curl http://localhost:8080/api/orders/123
# Should return fallback response after circuit opens

Resources

Comments

Share this article

Scan to read on mobile