Skip to main content
โšก Calmops

Saga Pattern: Managing Distributed Transactions

In microservices architectures, traditional ACID transactions don’t work across service boundaries. The Saga pattern provides a solution by managing distributed transactions through a sequence of local transactions with compensating actions.

In this guide, we’ll explore the Saga pattern, its two main approaches (choreography and orchestration), implementation strategies, and best practices.

The Distributed Transaction Problem

Why ACID Doesn’t Work

# Traditional ACID in microservices

problem:
  description: "Multiple services, each with their own database"
  
  example_order:
    - "Order Service: INSERT order (local transaction)"
    - "Payment Service: Process payment (different DB)"
    - "Inventory Service: Reserve items (different DB)"
    
  issue: "No single transaction can span all these services"

solutions:
  - "Two-Phase Commit (2PC)" โ†’ "Not recommended for microservices"
  - "Saga Pattern" โ†’ "Recommended approach"

Saga vs 2PC

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Transaction Approaches                    โ”‚
โ”‚                                                             โ”‚
โ”‚   Two-Phase Commit (2PC):                                   โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚   โ”‚                                                      โ”‚  โ”‚
โ”‚   โ”‚  Coordinator                                         โ”‚  โ”‚
โ”‚   โ”‚    โ”‚                                                 โ”‚  โ”‚
โ”‚   โ”‚    โ–ผ                                                 โ”‚  โ”‚
โ”‚   โ”‚  Prepare โ”€โ”€โ”€โ–บ All Services Lock Resources           โ”‚  โ”‚
โ”‚   โ”‚    โ”‚                                                 โ”‚  โ”‚
โ”‚   โ”‚    โ–ผ                                                 โ”‚  โ”‚
โ”‚   โ”‚  Commit โ”€โ”€โ”€โ–บ All Services Commit                     โ”‚  โ”‚
โ”‚   โ”‚    โ”‚                                                 โ”‚  โ”‚
โ”‚   โ”‚    โ–ผ                                                 โ”‚  โ”‚
โ”‚   โ”‚  (Or Rollback if any fails)                         โ”‚  โ”‚
โ”‚   โ”‚                                                      โ”‚  โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚   Problem: Blocks resources, single point of failure        โ”‚
โ”‚                                                             โ”‚
โ”‚   Saga Pattern:                                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚   โ”‚                                                      โ”‚  โ”‚
โ”‚   โ”‚  Service A โ”€โ”€โ–บ Service B โ”€โ”€โ–บ Service C              โ”‚  โ”‚
โ”‚   โ”‚     โ”‚            โ”‚            โ”‚                     โ”‚  โ”‚
โ”‚   โ”‚     โ–ผ            โ–ผ            โ–ผ                     โ”‚  โ”‚
โ”‚   โ”‚  Local TX     Local TX     Local TX                 โ”‚  โ”‚
โ”‚   โ”‚     โ”‚            โ”‚            โ”‚                     โ”‚  โ”‚
โ”‚   โ”‚     โ”‚            โ”‚            โ–ผ                     โ”‚  โ”‚
โ”‚   โ”‚     โ”‚            โ”‚       Compensate                 โ”‚  โ”‚
โ”‚   โ”‚     โ”‚            โ””โ”€โ”€โ”€โ”€โ–บ  if needed                   โ”‚  โ”‚
โ”‚   โ”‚     โ”‚                                               โ”‚  โ”‚
โ”‚   โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Compensate                      โ”‚  โ”‚
โ”‚   โ”‚                  if needed                            โ”‚  โ”‚
โ”‚   โ”‚                                                      โ”‚  โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚   Benefit: No blocking, each service owns its data          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Understanding Sagas

What is a Saga?

# A saga is a sequence of local transactions
# Each transaction has a compensating transaction

saga_definition = """
Saga = Sequence of Local Transactions + Compensating Transactions

Example: Order Placement
1. Create Order (local TX) 
   โ†’ Compensate: Cancel Order
2. Reserve Inventory (local TX) 
   โ†’ Compensate: Release Inventory  
3. Process Payment (local TX) 
   โ†’ Compensate: Refund Payment
4. Ship Order (local TX) 
   โ†’ Compensate: Mark as Returned

If any step fails, execute compensating transactions in reverse.
"""

# Each step returns success or triggers compensation
class SagaStep:
    def execute(self, data):
        """Execute the step"""
        pass
    
    def compensate(self, data):
        """Roll back the step if needed"""
        pass

Saga Properties

# Saga characteristics

saga_properties = {
    "atomicity": {
        "description": "Saga either completes or compensates",
        "not_guaranteed": "Unlike ACID, saga doesn't guarantee isolation"
    },
    
    "consistency": {
        "description": "Data may be temporarily inconsistent",
        "mitigation": "Use idempotency and eventual consistency patterns"
    },
    
    "isolation": {
        "description": "Not guaranteed between sagas",
        "mitigation": "Use optimistic concurrency, semantic locks"
    },
    
    "durability": {
        "description": "Each local transaction is durable",
        "mitigation": "Saga state persisted to database"
    }
}

Choreography-Based Saga

Services communicate through events. Each service listens for events and publishes events when performing actions.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Choreography-Based Saga                         โ”‚
โ”‚                                                             โ”‚
โ”‚   Order Service     Payment Service    Inventory Service    โ”‚
โ”‚       โ”‚                  โ”‚                  โ”‚                โ”‚
โ”‚       โ”‚  OrderCreated    โ”‚                  โ”‚                โ”‚
โ”‚       โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ   โ”‚                  โ”‚                โ”‚
โ”‚       โ”‚                  โ”‚                  โ”‚                โ”‚
โ”‚       โ”‚            PaymentProcessed         โ”‚                โ”‚
โ”‚       โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚                  โ”‚                โ”‚
โ”‚       โ”‚                  โ”‚                  โ”‚                โ”‚
โ”‚       โ”‚                  โ”‚   InventoryReserved              โ”‚
โ”‚       โ”‚                  โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚                โ”‚
โ”‚       โ”‚                  โ”‚                  โ”‚                โ”‚
โ”‚       โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚                โ”‚
โ”‚       โ”‚                  โ”‚                  โ”‚                โ”‚
โ”‚       โ”‚    OrderCompleted                   โ”‚                โ”‚
โ”‚       โ”‚                  โ”‚                  โ”‚                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Implementation: Choreography

# Event classes
class Event:
    def __init__(self, type, data):
        self.type = type
        self.data = data


# Order Service
class OrderService:
    def __init__(self, event_bus):
        self.event_bus = event_bus
    
    def create_order(self, order_data):
        # 1. Create order in pending state
        order = Order(
            id=generate_uuid(),
            customer_id=order_data['customer_id'],
            items=order_data['items'],
            status='pending'
        )
        self.order_repo.save(order)
        
        # 2. Publish OrderCreated event
        event = Event('OrderCreated', {
            'order_id': order.id,
            'customer_id': order.customer_id,
            'items': order.items,
            'total': order.total
        })
        self.event_bus.publish(event)
        
        return order
    
    def handle_order_compensating(self, event_data):
        # Handle compensation from other service
        order = self.order_repo.find_by_id(event_data['order_id'])
        order.status = 'cancelled'
        order.cancellation_reason = 'payment_failed'
        self.order_repo.save(order)


# Payment Service
class PaymentService:
    def __init__(self, event_bus):
        self.event_bus = event_bus
    
    def handle_order_created(self, event_data):
        try:
            # Process payment
            payment = self.payment_gateway.charge(
                customer_id=event_data['customer_id'],
                amount=event_data['total']
            )
            
            # Publish success event
            event = Event('PaymentProcessed', {
                'order_id': event_data['order_id'],
                'payment_id': payment.id
            })
            self.event_bus.publish(event)
            
        except PaymentFailed as e:
            # Publish failure event for compensation
            event = Event('PaymentFailed', {
                'order_id': event_data['order_id'],
                'reason': str(e)
            })
            self.event_bus.publish(event)
    
    def handle_payment_compensating(self, event_data):
        # Refund payment
        payment = self.payment_gateway.refund(event_data['payment_id'])


# Inventory Service
class InventoryService:
    def __init__(self, event_bus):
        self.event_bus = event_bus
    
    def handle_payment_processed(self, event_data):
        try:
            # Reserve inventory
            for item in event_data['items']:
                self.inventory.reserve(
                    product_id=item['product_id'],
                    quantity=item['quantity']
                )
            
            event = Event('InventoryReserved', {
                'order_id': event_data['order_id']
            })
            self.event_bus.publish(event)
            
        except OutOfStock as e:
            event = Event('InventoryFailed', {
                'order_id': event_data['order_id'],
                'reason': str(e)
            })
            self.event_bus.publish(event)
    
    def compensate_inventory(self, order_id):
        # Release reserved inventory
        self.inventory.release(order_id)


# Event Bus (Simple implementation)
class EventBus:
    def __init__(self):
        self.subscribers = {}
    
    def subscribe(self, event_type, handler):
        if event_type not in self.subscribers:
            self.subscribers[event_type] = []
        self.subscribers[event_type].append(handler)
    
    def publish(self, event):
        handlers = self.subscribers.get(event.type, [])
        for handler in handlers:
            handler(event.data)

Orchestration-Based Saga

A central coordinator (orchestrator) manages the saga execution. The orchestrator tells participants what to do and handles failures.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚            Orchestration-Based Saga                         โ”‚
โ”‚                                                             โ”‚
โ”‚                    Orchestrator                              โ”‚
โ”‚                        โ”‚                                     โ”‚
โ”‚          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                     โ”‚
โ”‚          โ–ผ             โ–ผ             โ–ผ                      โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                  โ”‚
โ”‚     โ”‚ Order  โ”‚   โ”‚Payment โ”‚   โ”‚Inventoryโ”‚                 โ”‚
โ”‚     โ”‚Service โ”‚   โ”‚Service โ”‚   โ”‚ Service โ”‚                 โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ”‚          โ”‚             โ”‚             โ”‚                      โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚          โ””                       โ”‚
โ”‚                        โ”‚                                     โ”‚
โ”‚                   Results                                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Implementation: Orchestration

# Saga Orchestrator
class OrderSagaOrchestrator:
    def __init__(self):
        self.steps = [
            SagaStep('create_order', self.create_order, self.compensate_order),
            SagaStep('process_payment', self.process_payment, self.refund_payment),
            SagaStep('reserve_inventory', self.reserve_inventory, self.release_inventory),
            SagaStep('ship_order', self.ship_order, self.cancel_shipment)
        ]
    
    def execute(self, order_data):
        completed_steps = []
        saga_data = {'order_data': order_data}
        
        for step in self.steps:
            try:
                result = step.execute(saga_data)
                saga_data[step.name] = result
                completed_steps.append(step)
                
            except Exception as e:
                # Compensate in reverse order
                self._compensate(completed_steps, saga_data, str(e))
                raise SagaFailed(str(e))
        
        return saga_data
    
    def _compensate(self, completed_steps, saga_data, reason):
        for step in reversed(completed_steps):
            try:
                step.compensate(saga_data)
            except Exception as e:
                # Log compensation failure
                log.error(f"Compensation failed for {step.name}: {e}")
                # Store for retry
                self.compensation_queue.add({
                    'step': step.name,
                    'data': saga_data,
                    'reason': reason
                })
    
    # Step implementations
    def create_order(self, data):
        order = order_service.create_order(data['order_data'])
        data['order_id'] = order.id
        return order
    
    def compensate_order(self, data):
        if 'order_id' in data:
            order_service.cancel_order(data['order_id'])
    
    def process_payment(self, data):
        payment = payment_service.charge(
            customer_id=data['order_data']['customer_id'],
            amount=data['order_data']['total']
        )
        data['payment_id'] = payment.id
        return payment
    
    def refund_payment(self, data):
        if 'payment_id' in data:
            payment_service.refund(data['payment_id'])
    
    def reserve_inventory(self, data):
        inventory_service.reserve_for_order(
            order_id=data['order_id'],
            items=data['order_data']['items']
        )
    
    def release_inventory(self, data):
        inventory_service.release_for_order(data['order_id'])
    
    def ship_order(self, data):
        shipment = shipping_service.create_shipment(
            order_id=data['order_id'],
            address=data['order_data']['shipping_address']
        )
        data['tracking_number'] = shipment.tracking_number
        return shipment
    
    def cancel_shipment(self, data):
        if 'tracking_number' in data:
            shipping_service.cancel(data['tracking_number'])

Saga State Management

# Persist saga state for recovery

class SagaState:
    PENDING = 'pending'
    IN_PROGRESS = 'in_progress'
    COMPLETED = 'completed'
    COMPENSATING = 'compensating'
    FAILED = 'failed'


class SagaRecord:
    """Database record for saga state"""
    
    def __init__(self, saga_id, saga_type, current_step, 
                 saga_data, status, compensation_data=None):
        self.id = saga_id
        self.type = saga_type
        self.current_step = current_step
        self.saga_data = saga_data  # JSON
        self.status = status
        self.compensation_data = compensation_data  # JSON
        self.created_at = datetime.utcnow()
        self.updated_at = datetime.utcnow()


class SagaStateStore:
    def __init__(self, db):
        self.db = db
    
    def create(self, saga_id, saga_type, initial_data):
        record = SagaRecord(
            saga_id=saga_id,
            saga_type=saga_type,
            current_step=0,
            saga_data=json.dumps(initial_data),
            status=SagaState.PENDING
        )
        self._save(record)
        return record
    
    def update_step(self, saga_id, step, data):
        record = self._find(saga_id)
        record.current_step = step
        record.saga_data = json.dumps(data)
        record.updated_at = datetime.utcnow()
        self._save(record)
    
    def mark_compensating(self, saga_id, compensation_data):
        record = self._find(saga_id)
        record.status = SagaState.COMPENSATING
        record.compensation_data = json.dumps(compensation_data)
        self._save(record)
    
    def mark_failed(self, saga_id, error):
        record = self._find(saga_id)
        record.status = SagaState.FAILED
        record.error = error
        self._save(record)
    
    def mark_completed(self, saga_id):
        record = self._find(saga_id)
        record.status = SagaState.COMPENSATING
        self._save(record)

Handling Failures

Compensating Transactions

# Compensation strategies

compensation_strategies = {
    "retry": {
        "description": "Retry failed compensation",
        "use_when": "Temporary failures (network timeout)"
    },
    
    "manual": {
        "description": "Alert operators for manual intervention",
        "use_when": "Complex failures requiring human judgment"
    },
    
    "saga_orchestrator": {
        "description": "Orchestrator handles compensation",
        "use_when": "Orchestration-based saga"
    }
}

# Example compensation handler
class CompensationHandler:
    def __init__(self, saga_store, notification_service):
        self.saga_store = saga_store
        self.notification = notification_service
    
    def handle_failure(self, saga_id):
        saga = self.saga_store.find(saga_id)
        
        if saga.status == SagaState.COMPENSATING:
            # Retry compensation
            self._retry_compensation(saga)
        
        elif saga.compensation_attempts > 3:
            # Escalate to manual intervention
            self.notification.alert_ops_team(saga)
            self.saga_store.mark_need_manual(saga_id)
    
    def _retry_compensation(self, saga):
        # Re-execute compensation logic
        for step in reversed(saga.completed_steps):
            try:
                step.compensate(saga.data)
            except:
                saga.compensation_attempts += 1
                raise

Idempotency

# Make operations idempotent to handle retries

class IdempotentPaymentService:
    def __init__(self, payment_gateway, idempotency_store):
        self.gateway = payment_gateway
        self.store = idempotency_store
    
    def charge(self, customer_id, amount, idempotency_key):
        # Check if already processed
        existing = self.store.get(idempotency_key)
        if existing:
            return existing
        
        # Process payment
        result = self.gateway.charge(customer_id, amount)
        
        # Store result
        self.store.set(idempotency_key, result)
        
        return result


# Idempotency key can be derived from business data
def get_idempotency_key(action, *args):
    # payment_customer123_amount100
    return f"{action}_{'_'.join(str(a) for a in args)}"

Choreography vs Orchestration

Comparison

# Choreography vs Orchestration

choreography:
  pros:
    - Loose coupling between services
    - No central point of failure
    - Each service owns its logic
  
  cons:
    - Hard to track overall progress
    - Complex to debug
    - Implicit flow, harder to understand
  
  best_for:
    - Simple workflows (2-3 services)
    - Teams that prefer event-driven
    - When services are independent

orchestration:
  pros:
    - Centralized flow control
    - Easy to track progress
    - Clear error handling
  
  cons:
    - Central orchestrator becomes bottleneck
    - More coupling to orchestrator
    - Single point of failure (mitigate with redundancy)
  
  best_for:
    - Complex workflows (many services)
    - When business logic is complex
    - Need for compensation logic

Decision Matrix

def choose_saga_approach(service_count, team_size, complexity):
    if service_count <= 3 and complexity == 'low':
        return "choreography"
    elif complexity == 'high':
        return "orchestration"
    elif team_size > 10:
        return "orchestration"  # Easier to understand
    else:
        return "choreography"

Implementation Example: Order Service

# Complete example with both approaches

# ========== CHOREOGRAPHY VERSION ==========

class OrderServiceChoreography:
    """Event-driven saga using choreography"""
    
    def create_order(self, order_data):
        # Create order
        order = Order(
            id=generate_uuid(),
            status='pending',
            **order_data
        )
        self.repo.save(order)
        
        # Emit event - Saga starts
        self.event_bus.publish('OrderCreated', {
            'order_id': order.id,
            **order_data
        })
        
        return order


# ========== ORCHESTRATION VERSION ==========

class OrderSagaOrchestrator:
    """Central orchestrator for order saga"""
    
    def create_order(self, order_data):
        # Create saga record
        saga_id = generate_uuid()
        saga_store.create(saga_id, 'order', order_data)
        
        try:
            # Execute saga steps
            result = self._execute_order_saga(saga_id, order_data)
            saga_store.mark_completed(saga_id)
            return result
            
        except SagaFailure as e:
            saga_store.mark_failed(saga_id, str(e))
            raise
    
    def _execute_order_saga(self, saga_id, order_data):
        # Step 1: Create order
        order = self.order_service.create(order_data)
        saga_store.update_step(saga_id, 'order_created', 
                              {'order_id': order.id})
        
        # Step 2: Process payment
        try:
            payment = self.payment_service.charge(
                customer_id=order.customer_id,
                amount=order.total
            )
            saga_store.update_step(saga_id, 'payment_processed',
                                  {'payment_id': payment.id})
        except PaymentFailed:
            # Compensate order creation
            self.order_service.cancel(order.id)
            raise
        
        # Step 3: Reserve inventory
        try:
            self.inventory_service.reserve(order.items)
            saga_store.update_step(saga_id, 'inventory_reserved', {})
        except InventoryError:
            # Compensate payment
            self.payment_service.refund(payment.id)
            # Compensate order
            self.order_service.cancel(order.id)
            raise
        
        # Step 4: Ship order
        shipment = self.shipping_service.create(
            order_id=order.id,
            address=order.shipping_address
        )
        
        # Update order status
        order.status = 'completed'
        self.order_service.save(order)
        
        return order

Best Practices

# Saga Best Practices

design:
  - Keep sagas short (ideally < 10 steps)
  - Make compensation idempotent
  - Use idempotency keys for all operations
  - Log saga state for debugging

timeout:
  - Set appropriate timeouts for each step
  - Use circuit breakers to prevent cascade failures
  - Have clear escalation paths

monitoring:
  - Track saga execution time
  - Alert on stuck sagas
  - Monitor compensation frequency

data:
  - Never hold locks across saga steps
  - Design for eventual consistency
  - Use optimistic concurrency where possible

Conclusion

The Saga pattern is essential for managing distributed transactions in microservices:

  • Choreography: Event-driven, loose coupling, harder to track
  • Orchestration: Centralized, easier to understand, potential bottleneck
  • Compensation: Each step must have a compensating action
  • Idempotency: Essential for handling retries safely
  • State Management: Persist saga state for recovery

Choose choreography for simple workflows and orchestration for complex business logic.


Comments