Introduction
Traditional ACID transactions don’t span microservice boundaries. When a single business operation — placing an order, booking a trip, processing a payment — updates data in multiple services, there is no database-level rollback across services. The saga pattern solves this by decomposing the distributed transaction into a sequence of local transactions, each with a compensating action that undoes it on failure.
This guide provides Python implementations of both saga coordination approaches using Temporal (orchestration) and Kafka (choreography), with Mermaid sequence diagrams, compensation handler patterns, idempotency key design, and a complete order management example.
Saga Fundamentals
sequenceDiagram
participant Order as Order Service
participant Payment as Payment Service
participant Inventory as Inventory Service
participant Shipping as Shipping Service
Note over Order,Shipping: Successful Order Flow
Order->>Payment: 1. Reserve payment
Payment-->>Order: Payment reserved
Order->>Inventory: 2. Reserve inventory
Inventory-->>Order: Inventory reserved
Order->>Shipping: 3. Schedule shipment
Shipping-->>Order: Shipment scheduled
Note over Order,Shipping: Order confirmed ✓
Note over Order,Shipping: Failed Flow with Compensation
Order->>Payment: 1. Reserve payment
Payment-->>Order: Payment reserved
Order->>Inventory: 2. Reserve inventory
Inventory-->>Order: OUT OF STOCK ✗
Order->>Payment: 3. COMPENSATE: Release payment
Payment-->>Order: Payment released
Note over Order,Shipping: Order cancelled, consistent state ✓
Two coordination approaches exist:
Choreography: Each service publishes events after its local transaction. Other services subscribe and react. No central coordinator. Best for simple, linear workflows with few participants.
Orchestration: A central coordinator (the saga orchestrator) tells each service what to do and handles failure. Best for complex workflows with branching, conditional logic, and many participants.
Orchestration with Temporal
Temporal provides durable execution — the orchestrator workflow survives server crashes, pod restarts, and network failures. It is the most reliable way to implement orchestrated sagas.
Temporal Saga Workflow
from temporalio import workflow
from temporalio.activity import Activity
from datetime import timedelta
# --- Activities (service calls with compensation) ---
@workflow.defn
class OrderSaga:
@workflow.run
async def run(self, order_id: str) -> bool:
"""Orchestrated saga for order processing with full compensation."""
# Step 1: Reserve payment
try:
payment_id = await workflow.execute_activity(
reserve_payment, order_id,
start_to_close_timeout=timedelta(seconds=10)
)
except Exception as e:
await self._handle_failure(order_id, "reserve_payment", e)
return False
# Step 2: Reserve inventory
try:
await workflow.execute_activity(
reserve_inventory, order_id,
start_to_close_timeout=timedelta(seconds=10)
)
except Exception as e:
# Compensate: release the payment
await workflow.execute_activity(
release_payment, payment_id,
start_to_close_timeout=timedelta(seconds=5)
)
await self._handle_failure(order_id, "reserve_inventory", e)
return False
# Step 3: Schedule shipment
try:
shipment_id = await workflow.execute_activity(
schedule_shipment, order_id,
start_to_close_timeout=timedelta(seconds=10)
)
except Exception as e:
# Compensate: release payment + unreserve inventory
await workflow.execute_activity(
release_payment, payment_id,
start_to_close_timeout=timedelta(seconds=5)
)
await workflow.execute_activity(
unreserve_inventory, order_id,
start_to_close_timeout=timedelta(seconds=5)
)
await self._handle_failure(order_id, "schedule_shipment", e)
return False
return True
async def _handle_failure(self, order_id: str, step: str, error: Exception):
"""Log the failure for monitoring.
Temporal preserves the full execution history automatically.
"""
workflow.logger.error(f"Saga failed for order {order_id} at step {step}: {error}")
Compensation Activity (Payment Release)
@workflow.defn
class PaymentActivities:
@workflow.run
async def reserve_payment(self, order_id: str) -> str:
"""Reserve payment. Returns a payment ID for compensation."""
result = await payment_client.create_hold(
amount=get_order_total(order_id),
idempotency_key=f"reserve-{order_id}"
)
return result.payment_id
@workflow.run
async def release_payment(self, payment_id: str):
"""Compensation: release the reserved payment."""
await payment_client.release_hold(
payment_id=payment_id,
idempotency_key=f"release-{payment_id}"
)
Idempotency Keys
Every activity that performs a side effect must be idempotent. Use idempotency keys so retries don’t duplicate operations:
def reserve_payment_with_idempotency(order_id: str) -> str:
"""Create a payment hold with idempotency key.
If the key was already processed, the server returns the existing result
rather than creating a duplicate hold.
"""
return payment_client.create_hold(
amount=get_order_total(order_id),
idempotency_key=f"reserve-{order_id}-v2"
)
Choreography with Kafka
For simpler workflows, choreography uses events to coordinate services without a central orchestrator:
sequenceDiagram
participant Order as Order Service
participant Payment as Payment Service
participant Inventory as Inventory Service
Note over Order,Inventory: Order Created Event
Order->>Kafka: Publish: OrderCreated(order_id=123)
Kafka-->>Payment: Consume: OrderCreated
Payment->>Payment: Reserve payment
Payment->>Kafka: Publish: PaymentReserved(order_id=123)
Kafka-->>Inventory: Consume: PaymentReserved
Inventory->>Inventory: Reserve inventory
alt Success
Inventory->>Kafka: Publish: InventoryReserved(order_id=123)
else Failure
Inventory->>Kafka: Publish: InventoryFailed(order_id=123, reason="out of stock")
Kafka-->>Payment: Consume: InventoryFailed
Payment->>Payment: Release payment
Payment->>Kafka: Publish: PaymentReleased(order_id=123)
end
Kafka Choreography Consumer Example
from kafka import KafkaConsumer, KafkaProducer
import json
producer = KafkaProducer(bootstrap_servers=['kafka:9092'])
consumer = KafkaConsumer(
'order-created',
bootstrap_servers=['kafka:9092'],
value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)
def handle_order_created(event):
"""Payment Service: respond to OrderCreated by reserving payment."""
order_id = event['order_id']
try:
payment_client.create_hold(
amount=event['total'],
idempotency_key=f"order-{order_id}"
)
producer.send('payment-reserved', {'order_id': order_id, 'status': 'ok'})
except Exception as e:
producer.send('payment-failed', {'order_id': order_id, 'error': str(e)})
def handle_inventory_failed(event):
"""Payment Service: compensate when inventory reservation fails."""
order_id = event['order_id']
payment_client.release_hold(idempotency_key=f"order-{order_id}")
producer.send('payment-released', {'order_id': order_id})
for message in consumer:
event = message.value
if message.topic == 'order-created':
handle_order_created(event)
elif message.topic == 'inventory-failed':
handle_inventory_failed(event)
Orchestration vs Choreography Decision
| Factor | Orchestration (Temporal) | Choreography (Kafka) |
|---|---|---|
| Coordination | Centralized, sequential | Decentralized, event-driven |
| Failure handling | Automatic retry + compensation in code | Manual compensation events |
| Visibility | Full execution history in Temporal UI | Requires tracing across event topics |
| Complexity ceiling | Handles complex branching well | Becomes hard to trace with >5 services |
| When to use | Complex workflows, strict consistency | Simple linear flows, loose coupling |
Resources
- Temporal Saga Pattern Documentation — Durable execution for distributed transactions
- Apache Kafka Documentation — Event streaming for choreography
- Microsoft Saga Pattern Guide — Architecture guidance
- Chris Richardson Saga Pattern — Original pattern definition
Comments