The Two-Phase Commit (2PC) protocol is a distributed algorithm that ensures all participating nodes in a distributed system either commit a transaction or abort it. This mechanism provides atomicity for transactions that span multiple independent systems, guaranteeing that the transaction is either fully completed across all nodes or fully rolled back.
Participants
The protocol involves two types of nodes:
- Coordinator: The node that originates the transaction and manages its lifecycle. It is responsible for making the final commit or abort decision.
- Participants (or Cohorts): The nodes that are involved in the transaction and vote on whether they are able to commit their part of it.
The Two Phases
Phase 1: Prepare Phase (Voting Phase)
In this phase, the coordinator asks all participants if they are ready to commit the transaction. This is like asking, “Can you promise to do this?”
- Prepare Request: The coordinator sends a
PREPAREmessage to all participants. - Participant Vote: Upon receiving the request, each participant determines if it can commit the transaction.
- If Yes: The participant writes a
PREPARErecord to its persistent log, acquires all necessary locks and resources, and sends aVOTE_COMMITmessage back to the coordinator. By voting “yes,” the participant makes a promise that it will commit if asked, no matter what. - If No: The participant sends a
VOTE_ABORTmessage. It can immediately abort its part of the transaction and release its resources without waiting for the coordinator.
- If Yes: The participant writes a
Phase 2: Commit Phase (Completion Phase)
The outcome of this phase is determined by the votes collected by the coordinator.
Case 1: Successful Commit
If the coordinator receives VOTE_COMMIT from all participants:
- Commit Decision: The coordinator makes the final decision to commit and writes a
COMMITrecord to its own log. - Send Commit: It sends a
GLOBAL_COMMITmessage to all participants. - Participant Commit: Participants receive the
GLOBAL_COMMITmessage, formally commit their part of the transaction, release any locks, and send anACK(acknowledgment) message back to the coordinator.
Case 2: Transaction Abort
If the coordinator receives at least one VOTE_ABORT or if a participant doesn’t respond before a timeout:
- Abort Decision: The coordinator decides to abort and writes an
ABORTrecord to its log. - Send Abort: It sends a
GLOBAL_ABORTmessage to all participants. - Participant Abort: Participants that had voted to commit receive the
GLOBAL_ABORTmessage, roll back all changes related to the transaction, and release their locks.
Disadvantages and Limitations
While 2PC guarantees atomicity, it has several significant drawbacks.
- Blocking Protocol: This is the most significant disadvantage. If the coordinator fails after sending the
PREPAREmessage but before sending the final decision, participants that votedVOTE_COMMITare left in a “prepared” state. They are blocked because they cannot unilaterally decide to commit or abort; they must wait for the coordinator to recover and provide the final decision. While blocked, they hold onto database locks and other resources, which can impact the entire system. - Single Point of Failure: The coordinator is a single point of failure. If it fails permanently, the participants might remain blocked forever, requiring manual intervention.
- Performance Overhead: The protocol is chatty. A successful transaction requires at least two rounds of messages (prepare -> vote, and commit -> ack), which increases latency.
- Unanimous Consent: The requirement for all participants to agree can reduce the availability of the system, as a single dissenting or failed participant will cause the entire transaction to fail.
Due to these limitations, alternatives like the Three-Phase Commit (3PC) protocol and consensus algorithms like Paxos or Raft are often considered for systems requiring higher availability and fault tolerance.