Skip to main content

Saga Pattern for Distributed Transactions: Implementation with Temporal and Kafka

Created: March 8, 2026 Larry Qu 4 min read

Introduction

Traditional ACID transactions don’t span microservice boundaries. When a single business operation — placing an order, booking a trip, processing a payment — updates data in multiple services, there is no database-level rollback across services. The saga pattern solves this by decomposing the distributed transaction into a sequence of local transactions, each with a compensating action that undoes it on failure.

This guide provides Python implementations of both saga coordination approaches using Temporal (orchestration) and Kafka (choreography), with Mermaid sequence diagrams, compensation handler patterns, idempotency key design, and a complete order management example.

Saga Fundamentals

sequenceDiagram
    participant Order as Order Service
    participant Payment as Payment Service
    participant Inventory as Inventory Service
    participant Shipping as Shipping Service

    Note over Order,Shipping: Successful Order Flow
    Order->>Payment: 1. Reserve payment
    Payment-->>Order: Payment reserved
    Order->>Inventory: 2. Reserve inventory
    Inventory-->>Order: Inventory reserved
    Order->>Shipping: 3. Schedule shipment
    Shipping-->>Order: Shipment scheduled
    Note over Order,Shipping: Order confirmed ✓

    Note over Order,Shipping: Failed Flow with Compensation
    Order->>Payment: 1. Reserve payment
    Payment-->>Order: Payment reserved
    Order->>Inventory: 2. Reserve inventory
    Inventory-->>Order: OUT OF STOCK ✗
    Order->>Payment: 3. COMPENSATE: Release payment
    Payment-->>Order: Payment released
    Note over Order,Shipping: Order cancelled, consistent state ✓

Two coordination approaches exist:

Choreography: Each service publishes events after its local transaction. Other services subscribe and react. No central coordinator. Best for simple, linear workflows with few participants.

Orchestration: A central coordinator (the saga orchestrator) tells each service what to do and handles failure. Best for complex workflows with branching, conditional logic, and many participants.

Orchestration with Temporal

Temporal provides durable execution — the orchestrator workflow survives server crashes, pod restarts, and network failures. It is the most reliable way to implement orchestrated sagas.

Temporal Saga Workflow

from temporalio import workflow
from temporalio.activity import Activity
from datetime import timedelta

# --- Activities (service calls with compensation) ---

@workflow.defn
class OrderSaga:
    @workflow.run
    async def run(self, order_id: str) -> bool:
        """Orchestrated saga for order processing with full compensation."""

        # Step 1: Reserve payment
        try:
            payment_id = await workflow.execute_activity(
                reserve_payment, order_id,
                start_to_close_timeout=timedelta(seconds=10)
            )
        except Exception as e:
            await self._handle_failure(order_id, "reserve_payment", e)
            return False

        # Step 2: Reserve inventory
        try:
            await workflow.execute_activity(
                reserve_inventory, order_id,
                start_to_close_timeout=timedelta(seconds=10)
            )
        except Exception as e:
            # Compensate: release the payment
            await workflow.execute_activity(
                release_payment, payment_id,
                start_to_close_timeout=timedelta(seconds=5)
            )
            await self._handle_failure(order_id, "reserve_inventory", e)
            return False

        # Step 3: Schedule shipment
        try:
            shipment_id = await workflow.execute_activity(
                schedule_shipment, order_id,
                start_to_close_timeout=timedelta(seconds=10)
            )
        except Exception as e:
            # Compensate: release payment + unreserve inventory
            await workflow.execute_activity(
                release_payment, payment_id,
                start_to_close_timeout=timedelta(seconds=5)
            )
            await workflow.execute_activity(
                unreserve_inventory, order_id,
                start_to_close_timeout=timedelta(seconds=5)
            )
            await self._handle_failure(order_id, "schedule_shipment", e)
            return False

        return True

    async def _handle_failure(self, order_id: str, step: str, error: Exception):
        """Log the failure for monitoring.
        Temporal preserves the full execution history automatically.
        """
        workflow.logger.error(f"Saga failed for order {order_id} at step {step}: {error}")

Compensation Activity (Payment Release)

@workflow.defn
class PaymentActivities:
    @workflow.run
    async def reserve_payment(self, order_id: str) -> str:
        """Reserve payment. Returns a payment ID for compensation."""
        result = await payment_client.create_hold(
            amount=get_order_total(order_id),
            idempotency_key=f"reserve-{order_id}"
        )
        return result.payment_id

    @workflow.run
    async def release_payment(self, payment_id: str):
        """Compensation: release the reserved payment."""
        await payment_client.release_hold(
            payment_id=payment_id,
            idempotency_key=f"release-{payment_id}"
        )

Idempotency Keys

Every activity that performs a side effect must be idempotent. Use idempotency keys so retries don’t duplicate operations:

def reserve_payment_with_idempotency(order_id: str) -> str:
    """Create a payment hold with idempotency key.
    If the key was already processed, the server returns the existing result
    rather than creating a duplicate hold.
    """
    return payment_client.create_hold(
        amount=get_order_total(order_id),
        idempotency_key=f"reserve-{order_id}-v2"
    )

Choreography with Kafka

For simpler workflows, choreography uses events to coordinate services without a central orchestrator:

sequenceDiagram
    participant Order as Order Service
    participant Payment as Payment Service
    participant Inventory as Inventory Service

    Note over Order,Inventory: Order Created Event
    Order->>Kafka: Publish: OrderCreated(order_id=123)

    Kafka-->>Payment: Consume: OrderCreated
    Payment->>Payment: Reserve payment
    Payment->>Kafka: Publish: PaymentReserved(order_id=123)

    Kafka-->>Inventory: Consume: PaymentReserved
    Inventory->>Inventory: Reserve inventory
    alt Success
        Inventory->>Kafka: Publish: InventoryReserved(order_id=123)
    else Failure
        Inventory->>Kafka: Publish: InventoryFailed(order_id=123, reason="out of stock")
        Kafka-->>Payment: Consume: InventoryFailed
        Payment->>Payment: Release payment
        Payment->>Kafka: Publish: PaymentReleased(order_id=123)
    end

Kafka Choreography Consumer Example

from kafka import KafkaConsumer, KafkaProducer
import json

producer = KafkaProducer(bootstrap_servers=['kafka:9092'])
consumer = KafkaConsumer(
    'order-created',
    bootstrap_servers=['kafka:9092'],
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

def handle_order_created(event):
    """Payment Service: respond to OrderCreated by reserving payment."""
    order_id = event['order_id']
    try:
        payment_client.create_hold(
            amount=event['total'],
            idempotency_key=f"order-{order_id}"
        )
        producer.send('payment-reserved', {'order_id': order_id, 'status': 'ok'})
    except Exception as e:
        producer.send('payment-failed', {'order_id': order_id, 'error': str(e)})

def handle_inventory_failed(event):
    """Payment Service: compensate when inventory reservation fails."""
    order_id = event['order_id']
    payment_client.release_hold(idempotency_key=f"order-{order_id}")
    producer.send('payment-released', {'order_id': order_id})

for message in consumer:
    event = message.value
    if message.topic == 'order-created':
        handle_order_created(event)
    elif message.topic == 'inventory-failed':
        handle_inventory_failed(event)

Orchestration vs Choreography Decision

Factor Orchestration (Temporal) Choreography (Kafka)
Coordination Centralized, sequential Decentralized, event-driven
Failure handling Automatic retry + compensation in code Manual compensation events
Visibility Full execution history in Temporal UI Requires tracing across event topics
Complexity ceiling Handles complex branching well Becomes hard to trace with >5 services
When to use Complex workflows, strict consistency Simple linear flows, loose coupling

Resources

Comments

👍 Was this article helpful?