Introduction
When a business operation spans multiple services — create order, reserve inventory, charge payment — you need all three to succeed or all three to fail. Traditional database transactions don’t work across service boundaries. This guide covers the patterns that do, with working code.
The core problem:
Order Service → Inventory Service → Payment Service
If payment fails after inventory is reserved:
- Inventory is reserved but no order exists
- Money not charged but inventory locked
→ Inconsistent state
Four main patterns address distributed transactions, each with different consistency guarantees:
| Pattern | Consistency | Coordination | Best For |
|---|---|---|---|
| 2PC | Strong (ACID) | Synchronous, coordinator | Same-DC, financial, small scale |
| TCC | Stronger than Saga | Synchronous Try, async Confirm/Cancel | Resources needing provisional holds |
| Saga | Eventual (BASE) | Choreography or orchestration | Long-running, multi-step workflows |
| Outbox | Per-service atomicity | Async, polling or CDC | Reliable event publishing |
Two-Phase Commit (2PC): Strong Consistency
2PC coordinates a transaction across multiple databases. A coordinator asks all participants to “prepare” (lock resources), then either commits or rolls back all of them.
Phase 1 (Prepare):
Coordinator → "Can you commit?" → DB1, DB2, DB3
All respond "Yes" → proceed
Any responds "No" → abort
Phase 2 (Commit):
Coordinator → "Commit!" → DB1, DB2, DB3
When to Use 2PC
✓ Financial transactions requiring ACID guarantees
✓ All participants are in the same data center
✓ You can tolerate higher latency
✓ Participants support XA transactions (PostgreSQL, MySQL, SQL Server)
✗ Across geographically distributed systems
✗ When any participant might be unavailable
✗ High-throughput systems (2PC is blocking during prepare phase)
✗ Systems that require the coordinator to be stateless
PostgreSQL XA Example
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
def transfer_funds_2pc(from_account: int, to_account: int, amount: float):
"""Transfer funds across two databases using 2PC."""
conn1 = psycopg2.connect("dbname=bank1")
conn2 = psycopg2.connect("dbname=bank2")
xid1 = conn1.xid(42, "transfer", "branch1")
xid2 = conn2.xid(42, "transfer", "branch2")
try:
# Phase 1: Prepare both databases
conn1.tpc_begin(xid1)
cur1 = conn1.cursor()
cur1.execute("UPDATE accounts SET balance = balance - %s WHERE id = %s",
(amount, from_account))
conn1.tpc_prepare()
conn2.tpc_begin(xid2)
cur2 = conn2.cursor()
cur2.execute("UPDATE accounts SET balance = balance + %s WHERE id = %s",
(amount, to_account))
conn2.tpc_prepare()
# Phase 2: Commit both
conn1.tpc_commit(xid1)
conn2.tpc_commit(xid2)
except Exception as e:
# Rollback both on any failure
try:
conn1.tpc_rollback(xid1)
except:
pass
try:
conn2.tpc_rollback(xid2)
except:
pass
raise
finally:
conn1.close()
conn2.close()
The Coordinator Availability Problem
A critical 2PC failure mode: if the coordinator crashes after sending prepare but before committing, participants hold locks until the coordinator recovers. In PostgreSQL, prepared transactions appear in pg_prepared_xacts and must be manually committed or rolled back. For production deployments, run the coordinator as a highly available service (e.g., with a Raft consensus group) or use pg_prepared_xacts monitoring with alerting on stale transactions.
TCC: Try-Confirm-Cancel
Try-Confirm-Cancel sits between 2PC and Saga on the consistency spectrum. Each participant reserves resources in a provisional state (Try), then either finalizes (Confirm) or releases (Cancel). Unlike 2PC, TCC does not hold locks during the try phase — it uses application-level provisional holds.
Try Phase Confirm/Cancel Phase
┌──────────────┐ ┌──────────────────┐
│ Reserve │ │ Confirm or │
│ inventory │──success──→│ Cancel all │
│ Hold payment│ │ reservations │
│ Reserve │ │ │
│ shipping │──failure──→│ Cancel all │
└──────────────┘ └──────────────────┘
When to Use TCC
✓ Need stronger consistency than Saga (provisional state is deterministic)
✓ Resources can be "reserved" without locking (inventory holds, payment authorizations)
✓ Want non-blocking coordination but still need atomic commitment
✓ Medium-complexity workflows (3-5 participants)
✗ Resources that cannot hold provisional state (e.g., irreversible operations)
✗ Very long-running transactions (provisional state wastes capacity)
✗ When idempotent Try/Confirm/Cancel handlers are too complex to maintain
Python TCC Coordinator
# tcc_coordinator.py
from dataclasses import dataclass
import requests
@dataclass
class TCCParticipant:
name: str
try_url: str
confirm_url: str
cancel_url: str
class OrderTCCCoordinator:
"""Try-Confirm-Cancel coordinator for order placement."""
def __init__(self, order_id: str, amount: float):
self.order_id = order_id
self.amount = amount
self.participants = [
TCCParticipant("inventory",
f"http://inventory/try_reserve",
f"http://inventory/confirm_reserve",
f"http://inventory/cancel_reserve"),
TCCParticipant("payment",
f"http://payment/try_charge",
f"http://payment/confirm_charge",
f"http://payment/cancel_charge"),
]
self.try_results = {}
def execute(self) -> bool:
"""Run TCC: try all, then confirm or cancel."""
try:
# Phase 1: Try all participants
for p in self.participants:
resp = requests.post(
p.try_url,
json={"order_id": self.order_id, "amount": self.amount},
timeout=5,
)
if resp.status_code != 200:
raise RuntimeError(f"{p.name} Try failed: {resp.text}")
self.try_results[p.name] = resp.json()
# Phase 2: All Try succeeded — Confirm all
for p in self.participants:
resp = requests.post(
p.confirm_url,
json={"order_id": self.order_id,
"reservation": self.try_results[p.name]},
timeout=5,
)
if resp.status_code != 200:
# Confirm failed — this is critical.
# Manual intervention or retry logic required.
raise RuntimeError(f"{p.name} Confirm failed: {resp.text}")
return True
except Exception:
# Any Try or Confirm failed — Cancel all successful reservations
for p in self.participants:
if p.name in self.try_results:
try:
requests.post(
p.cancel_url,
json={"order_id": self.order_id},
timeout=5,
)
except requests.RequestException:
pass # Best-effort cancel; retry via reconciliation
return False
TCC Participant Service Example
# inventory_service.py — TCC participant
import uuid
# In-memory provisional storage (use a database in production)
provisional_holds = {}
def try_reserve(order_id: str, items: list) -> dict:
"""Reserve inventory provisionally."""
hold_id = str(uuid.uuid4())
# Use a provisional state — reserved but not deducted
provisional_holds[hold_id] = {
"order_id": order_id,
"items": items,
"status": "provisional",
"expires_at": time.time() + 300, # 5-minute TTL
}
return {"hold_id": hold_id, "status": "provisional"}
def confirm_reserve(order_id: str, reservation: dict) -> dict:
"""Convert provisional hold to permanent deduction."""
hold = provisional_holds.get(reservation["hold_id"])
if not hold or hold["status"] != "provisional":
return {"status": "already_confirmed"}
hold["status"] = "confirmed"
db.execute(
"UPDATE inventory SET quantity = quantity - %s WHERE sku = %s",
(hold["items"]["quantity"], hold["items"]["sku"])
)
return {"status": "confirmed"}
def cancel_reserve(order_id: str) -> dict:
"""Release provisional hold."""
for hold_id, hold in list(provisional_holds.items()):
if hold["order_id"] == order_id and hold["status"] == "provisional":
hold["status"] = "cancelled"
del provisional_holds[hold_id]
return {"status": "cancelled"}
The try phase places a provisional hold (never a blocking lock). The confirm phase makes it permanent. The cancel phase releases it. Each operation must be idempotent — the coordinator may retry any call after a timeout.
Saga Pattern: Eventual Consistency
Sagas break a distributed transaction into a sequence of local transactions. Each step publishes an event. If a step fails, compensating transactions undo the completed steps.
Order Saga:
1. Create Order (pending) → publish OrderCreated
2. Reserve Inventory → publish InventoryReserved
3. Charge Payment → publish PaymentCharged
4. Confirm Order → publish OrderConfirmed
Compensation (if payment fails):
3. Payment failed → publish PaymentFailed
2. Release Inventory ← compensate step 2
1. Cancel Order ← compensate step 1
Choreography-Based Saga
Services react to events — no central coordinator:
## order_service.py
import json
from kafka import KafkaProducer, KafkaConsumer
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
value_serializer=lambda v: json.dumps(v).encode()
)
def create_order(user_id: str, items: list) -> str:
"""Step 1: Create order in pending state."""
order_id = generate_id()
# Local transaction
with db.transaction():
db.execute("""
INSERT INTO orders (id, user_id, status, items)
VALUES (%s, %s, 'pending', %s)
""", (order_id, user_id, json.dumps(items)))
# Publish event to trigger next step
producer.send('orders', {
'event_type': 'OrderCreated',
'order_id': order_id,
'user_id': user_id,
'items': items,
})
return order_id
## Listen for compensation events
consumer = KafkaConsumer('payments', bootstrap_servers=['kafka:9092'])
for message in consumer:
event = json.loads(message.value)
if event['event_type'] == 'PaymentFailed':
# Compensate: cancel the order
with db.transaction():
db.execute("""
UPDATE orders SET status = 'cancelled',
cancelled_reason = %s
WHERE id = %s
""", (event.get('reason', 'payment_failed'), event['order_id']))
producer.send('orders', {
'event_type': 'OrderCancelled',
'order_id': event['order_id'],
'reason': 'payment_failed',
})
Orchestration-Based Saga
A central orchestrator coordinates the steps — easier to debug:
## order_saga_orchestrator.py
from enum import Enum
from dataclasses import dataclass
class SagaState(Enum):
STARTED = "started"
INVENTORY_RESERVED = "inventory_reserved"
PAYMENT_CHARGED = "payment_charged"
COMPLETED = "completed"
COMPENSATING = "compensating"
FAILED = "failed"
@dataclass
class OrderSaga:
order_id: str
state: SagaState
compensation_steps: list
class OrderSagaOrchestrator:
def __init__(self, inventory_client, payment_client, db):
self.inventory = inventory_client
self.payment = payment_client
self.db = db
def execute(self, order_id: str, items: list, payment_info: dict) -> dict:
saga = OrderSaga(order_id=order_id, state=SagaState.STARTED,
compensation_steps=[])
try:
# Step 1: Reserve inventory
reservation = self.inventory.reserve(order_id, items)
saga.state = SagaState.INVENTORY_RESERVED
saga.compensation_steps.append(
lambda: self.inventory.release(order_id,
reservation['reservation_id'])
)
self._save_saga(saga)
# Step 2: Charge payment
charge = self.payment.charge(order_id, payment_info,
reservation['total'])
saga.state = SagaState.PAYMENT_CHARGED
saga.compensation_steps.append(
lambda: self.payment.refund(order_id, charge['transaction_id'])
)
self._save_saga(saga)
# Step 3: Confirm order
self.db.execute(
"UPDATE orders SET status = 'confirmed' WHERE id = %s",
(order_id,)
)
saga.state = SagaState.COMPLETED
self._save_saga(saga)
return {"status": "success", "order_id": order_id}
except Exception as e:
saga.state = SagaState.COMPENSATING
self._save_saga(saga)
for compensate in reversed(saga.compensation_steps):
try:
compensate()
except Exception as comp_err:
print(f"Compensation failed: {comp_err}")
saga.state = SagaState.FAILED
self._save_saga(saga)
raise
def _save_saga(self, saga: OrderSaga):
self.db.execute("""
INSERT INTO saga_state (order_id, state, compensation_steps, updated_at)
VALUES (%s, %s, %s, NOW())
ON CONFLICT (order_id) DO UPDATE
SET state = EXCLUDED.state, updated_at = NOW()
""", (saga.order_id, saga.state.value, json.dumps([])))
Saga Transaction Types
Azure’s guidance on the Saga pattern introduces three transaction categories that improve robustness:
| Type | Description | Behavior on Failure |
|---|---|---|
| Compensable | Can be undone by a compensating transaction | Execute compensation in reverse order |
| Pivot | Point of no return — last compensable step | Must succeed for saga completion |
| Retryable | Follows the pivot; idempotent by design | Retry until success; no compensation |
Design your saga so the pivot transaction is the boundary between reversible and irreversible work. All steps after the pivot must be idempotent and retryable.
Saga Isolation & Anomaly Countermeasures
Sagas sacrifice the Isolation property of ACID. Without proper countermeasures, concurrent sagas can produce data anomalies:
| Anomaly | Description | Countermeasure |
|---|---|---|
| Lost update | One saga overwrites another’s uncommitted change | Version files, optimistic locking |
| Dirty read | One saga reads data another has modified provisionally | Semantic lock or pessimistic view |
| Non-repeatable read | Different steps in one saga see different states | Reread values before write |
Version files prevent lost updates by tracking a record version:
def update_order_with_version(order_id: str, new_status: str, expected_version: int):
"""Optimistic locking — fails if another saga modified the record."""
result = db.execute("""
UPDATE orders
SET status = %s, version = version + 1
WHERE id = %s AND version = %s
""", (new_status, order_id, expected_version))
if result.rowcount == 0:
raise ConcurrentModificationError(
f"Order {order_id} was modified by another transaction"
)
Semantic locks use application-level flags to signal provisional state:
def reserve_credit(customer_id: str, amount: float) -> bool:
"""Reserve credit using a semantic lock — not a DB lock."""
return db.execute("""
UPDATE customer_credit
SET reserved_amount = reserved_amount + %s
WHERE customer_id = %s
AND (total_limit - (reserved_amount + %s)) >= 0
""", (amount, customer_id, amount)).rowcount > 0
The reserved_amount column acts as a semaphore. Other sagas see the provisional hold and avoid oversubscribing the credit limit.
Outbox Pattern: Reliable Event Publishing
The outbox pattern solves the dual-write problem: how to update your database AND publish an event atomically.
The problem:
## WRONG: if Kafka publish fails, DB is updated but no event sent
db.execute("UPDATE orders SET status = 'confirmed' WHERE id = %s", (order_id,))
kafka.send('orders', {'event': 'OrderConfirmed', 'order_id': order_id}) # might fail!
The solution: Write the event to an outbox table in the same transaction, then a separate process publishes it:
## CORRECT: atomic write to DB + outbox
def confirm_order(order_id: str):
with db.transaction():
# Update business data
db.execute(
"UPDATE orders SET status = 'confirmed' WHERE id = %s",
(order_id,)
)
# Write event to outbox (same transaction)
db.execute("""
INSERT INTO outbox (id, topic, key, payload, created_at)
VALUES (gen_random_uuid(), 'orders', %s, %s, NOW())
""", (
order_id,
json.dumps({'event_type': 'OrderConfirmed', 'order_id': order_id})
))
# Transaction committed — both DB update and outbox entry are durable
Two approaches exist for the relay that reads the outbox and publishes to the broker: polling and CDC.
Polling-Based Relay
## outbox_publisher.py — runs as a separate process
import time
def publish_outbox_events():
"""Continuously read from outbox and publish to Kafka."""
while True:
events = db.fetchall("""
SELECT id, topic, key, payload
FROM outbox
WHERE published_at IS NULL
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED
""")
for event in events:
try:
kafka.send(
topic=event['topic'],
key=event['key'].encode(),
value=event['payload'].encode()
)
kafka.flush()
db.execute(
"UPDATE outbox SET published_at = NOW() WHERE id = %s",
(event['id'],)
)
except Exception as e:
print(f"Failed to publish {event['id']}: {e}")
# Will retry on next iteration
time.sleep(0.1) # 100ms polling interval
-- Outbox table schema
CREATE TABLE outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
topic VARCHAR(255) NOT NULL,
key VARCHAR(255),
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
published_at TIMESTAMPTZ
);
CREATE INDEX idx_outbox_unpublished
ON outbox (created_at)
WHERE published_at IS NULL;
-- Clean up old published events
DELETE FROM outbox WHERE published_at < NOW() - INTERVAL '7 days';
CDC-Based Outbox (Debezium)
Change Data Capture eliminates polling by streaming database changes directly from the transaction log. Debezium reads the PostgreSQL Write-Ahead Log (WAL) in near-real-time (single-digit milliseconds after commit) and publishes matching events to Kafka.
Prerequisites for CDC with PostgreSQL:
# postgresql.conf — must enable logical replication
wal_level = logical
max_replication_slots = 5
max_wal_senders = 5
Debezium connector configuration:
{
"name": "orders-outbox-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres.example.com",
"database.port": "5432",
"database.user": "debezium",
"database.password": "${file:/secrets/db-password.txt}",
"database.dbname": "orders_service",
"database.server.name": "orders",
"slot.name": "orders_outbox_slot",
"table.include.list": "public.outbox",
"tombstones.on.delete": "false",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
"transforms.outbox.table.field.event.type": "event_type",
"transforms.outbox.table.field.event.id": "id",
"transforms.outbox.table.field.payload": "payload",
"transforms.outbox.table.field.schema.version": "aggregate_type",
"transforms.outbox.route.topic.replacement": "orders.${routedByValue}"
}
}
The Outbox Event Router SMT (Single Message Transform) extracts the payload, routes to the correct Kafka topic, and strips database metadata so consumers receive clean business events.
Key Debezium 2.5+ features (2025):
- Incremental snapshots — capture existing outbox records without table locks
- Notification channels — PostgreSQL
LISTEN/NOTIFYintegration for immediate event detection - Improved schema history — tolerant of outbox column additions
- Enhanced JMX metrics — track processing lag per connector
Polling vs CDC Comparison
| Aspect | Polling Relay | CDC (Debezium) |
|---|---|---|
| Latency | ~100ms+ (poll interval) | <10ms after commit |
| DB impact | Repeated SELECT queries | Reads WAL — minimal overhead |
| Scaling | Degrades with throughput | Handles high throughput |
| Infrastructure | None beyond the app | Kafka Connect cluster |
| Operations | Simple, single service | Connector management, replication slots |
| Message ordering | Within SELECT order | Maintains WAL order per table |
| Best for | Moderate throughput, simpler ops | High throughput, existing Kafka infra |
When to choose polling: You have moderate event volume (<1000 events/second), no existing Kafka infrastructure, or want to minimize operational complexity.
When to choose CDC: You need near-zero latency, have high throughput, already run Kafka and Kafka Connect, or want zero application overhead for event publishing.
Durable Execution
Durable execution platforms (Temporal, Dapr, AWS Step Functions) offer an alternative to manually implementing sagas, outboxes, and retry logic. A durable execution engine guarantees that your workflow code runs to completion — even if the process crashes midway.
Instead of writing a saga orchestrator with explicit state persistence and compensation loops:
def create_order_saga(order_id: str, items: list, payment_info: dict):
"""With a durable execution engine, this is plain code."""
try:
inventory.reserve(order_id, items)
payment.charge(order_id, payment_info)
db.execute("UPDATE orders SET status = 'confirmed' WHERE id = %s",
(order_id,))
except Exception:
if inventory.is_reserved(order_id):
inventory.release(order_id)
if payment.is_charged(order_id):
payment.refund(order_id)
raise
The engine automatically retries on failure, persists execution state, and resumes from the exact point of failure after a crash. No outbox table needed — the engine ensures each step executes exactly once.
Major Durable Execution Platforms
| Platform | Language Support | Key Strength |
|---|---|---|
| Temporal | Go, Java, Python, TypeScript, .NET | Mature, self-hosted or cloud |
| Dapr Workflow | Any (sidecar) | Polyglot, Kubernetes-native |
| AWS Step Functions | JSON state machines | Serverless, AWS ecosystem |
| Netflix Conductor | Java, Python, Go | Pluggable persistence, metrics |
Temporal’s saga support includes built-in retries with exponential backoff, automatic compensation on failure, and comprehensive visibility into workflow state. The Saga class provides a DSL:
// Temporal Saga example (Java)
public void createOrderSaga(Order order) {
Saga saga = new Saga(new Saga.Options.Builder().build());
try {
String inventoryReservationId = saga.addCompensation(
() -> inventoryClient.cancelReservation(order.getOrderId()),
() -> inventoryClient.reserve(order)
);
String chargeId = saga.addCompensation(
() -> paymentClient.refund(order.getOrderId()),
() -> paymentClient.charge(order)
);
orderClient.confirm(order.getOrderId());
saga.complete();
} catch (Exception e) {
saga.compensate();
throw e;
}
}
Durable execution does not eliminate the need for compensating transactions — but it removes the infrastructure burden of persisting saga state, retrying failures, and handling crash recovery.
Idempotency: Handle Duplicate Events
Sagas and event-driven systems deliver messages at-least-once. Your handlers must be idempotent:
def handle_payment_charged(event: dict):
"""Idempotent handler — safe to call multiple times."""
order_id = event['order_id']
transaction_id = event['transaction_id']
existing = db.fetchone(
"SELECT id FROM processed_events WHERE event_id = %s",
(event['event_id'],)
)
if existing:
return
with db.transaction():
db.execute(
"UPDATE orders SET status = 'paid', transaction_id = %s WHERE id = %s",
(transaction_id, order_id)
)
db.execute(
"""INSERT INTO processed_events (event_id, processed_at)
VALUES (%s, NOW())""",
(event['event_id'],)
)
Idempotency applies to all patterns in this guide:
- 2PC: The coordinator must tolerate repeated prepare/commit calls
- TCC: Try, Confirm, and Cancel must all be idempotent
- Saga: Compensation steps must not double-effect
- Outbox: The publisher must handle the same outbox row appearing twice
Choosing the Right Pattern
Strong consistency required (financial, medical)?
→ 2PC if all services in same data center
→ TCC if resources support provisional holds
→ Saga with careful compensation if geographically distributed
High availability required?
→ Saga (choreography or orchestration)
→ Accept eventual consistency
Need reliable event publishing?
→ Always use Outbox pattern
→ CDC-based Debezium for high throughput
→ Polling relay for simpler operations
Simple workflow (2-3 steps)?
→ Choreography saga
Complex workflow (5+ steps, many failure modes)?
→ Orchestration saga (easier to debug)
→ Or durable execution platform (Temporal, Step Functions)
Want to minimize infrastructure code?
→ Durable execution platform handles retries, state, compensation
Need audit trail?
→ Event sourcing + Saga
→ Or outbox with CDC for change history
Decision matrix by operational constraints:
| Constraint | Recommended Pattern |
|---|---|
| Zero data loss, same-DC | 2PC |
| Zero data loss, multi-DC | TCC |
| High throughput, eventual consistency | Saga + Outbox (CDC) |
| Minimal ops overhead | Saga + Outbox (polling) |
| Complex orchestration, team unfamiliar with saga | Durable execution (Temporal, Step Functions) |
| Existing Kafka infrastructure | Saga + Outbox (Debezium CDC) |
Conclusion
Distributed transactions require careful pattern selection based on consistency needs and performance requirements. Two-phase commit provides the strongest guarantees but sacrifices availability and throughput. TCC offers a middle ground with provisional state management. The Saga pattern is the most practical choice for long-running business transactions where eventual consistency is acceptable. Outbox patterns provide reliable messaging without distributed coordination, and modern CDC implementations make them nearly real-time. Durable execution platforms simplify implementation further by automating retries, state persistence, and compensation.
Always prefer patterns that minimize synchronous coordination across services, and design every handler to be idempotent. Measure your consistency requirements honestly — many systems can tolerate seconds of inconsistency in exchange for dramatically simpler architecture.
Resources
- Microservices.io — Saga Pattern
- Microservices.io — Transactional Outbox
- Outbox Pattern for Reliable Event Publishing — Conduktor
- Temporal — Mastering Saga Patterns
- Azure Architecture Center — Saga Design Pattern
- Designing Data-Intensive Applications — Martin Kleppmann
- Seata — Distributed Transaction Solution
- Consistent Distributed Transactions: Emerging Patterns — The Backend Developer
Comments