Two-Phase Commit

By Rahul Kumar · Senior Software Engineer · Updated May 2026 · Category: Petrov · Distributed Systems (Part II)

Prepare/commit + coordinator crash blocking. Ch 13.

This interactive explanation is built for system design interview prep: step through Two-Phase Commit, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

Two-phase commit (2PC) is the classical protocol for atomically committing a transaction across multiple independent participants: every participant either commits or every participant aborts, even though they sit behind their own locks and logs. A coordinator drives the transaction through two rounds. In Phase 1 it asks every participant "can you commit?" and each participant either votes YES (and durably promises it will commit if asked) or votes NO. In Phase 2 the coordinator tallies the votes and broadcasts the decision: COMMIT if every vote was YES, ABORT otherwise. The protocol is the default in XA, classical distributed databases, and many microservice saga implementations, and it is the honest answer whenever you need cross-service atomicity. It is also famously the protocol that blocks forever if the coordinator dies at the worst possible moment, which is why it is the gateway drug to three-phase commit, Paxos Commit, and Raft-backed transaction managers.

Two-Phase Commit — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Two-Phase Commit widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

A transaction arrives at the coordinator with a set of participants already enlisted (databases, queues, remote services). The coordinator assigns a transaction id, durably logs BEGIN, and enters Phase 1 by sending PREPARE to every participant. A participant that receives PREPARE executes the transaction locally up to the point of commit, acquires all necessary locks, writes an undo/redo record to its WAL, forces it to disk, and only then replies YES; this is the contract that makes its vote binding. A participant that cannot commit (constraint violation, resource exhaustion) writes ABORT locally and replies NO. The coordinator waits for all votes. If any vote is NO (or any participant times out), it logs ABORT and sends ABORT to everyone. If every vote is YES, it logs COMMIT - this log record is the point of no return - and sends COMMIT to everyone. Participants apply the durable outcome, release their locks, and ack. The coordinator clears the transaction log when all acks return. The protocol is blocking because a participant that has voted YES and then lost contact with the coordinator cannot unilaterally decide: it must keep locks held and wait for the coordinator to recover and tell it the fate. In practice, production 2PC pairs the coordinator with heartbeats, a replicated coordinator log, and a participant-to-participant "what did you hear?" recovery path to limit how long a crashed coordinator can freeze the cluster.

Implementation

Coordinator driving PREPARE and COMMIT rounds

public class Coordinator {
    private final List<ParticipantClient> participants;
    private final TransactionLog log;
    private static final long VOTE_TIMEOUT_MS = 2_000;

    public Coordinator(List<ParticipantClient> participants, TransactionLog log) {
        this.participants = participants; this.log = log;
    }

    public boolean commit(String txId) {
        log.append(txId, "BEGIN");
        // Phase 1: PREPARE
        List<CompletableFuture<Boolean>> votes = new ArrayList<>();
        for (ParticipantClient p : participants) {
            votes.add(CompletableFuture.supplyAsync(() -> p.prepare(txId))
                    .orTimeout(VOTE_TIMEOUT_MS, TimeUnit.MILLISECONDS)
                    .exceptionally(e -> false));
        }
        boolean allYes = votes.stream().map(CompletableFuture::join)
                .reduce(true, Boolean::logicalAnd);

        // Phase 2: decide and broadcast
        if (allYes) {
            log.append(txId, "COMMIT"); // point of no return
            for (ParticipantClient p : participants) p.commit(txId);
            log.append(txId, "DONE");
            return true;
        } else {
            log.append(txId, "ABORT");
            for (ParticipantClient p : participants) p.abort(txId);
            log.append(txId, "DONE");
            return false;
        }
    }

    /** Called on coordinator restart to finish any in-flight transactions. */
    public void recover() {
        for (TransactionLog.Record r : log.unfinished()) {
            String decision = r.lastEntry();
            for (ParticipantClient p : participants) {
                if ("COMMIT".equals(decision)) p.commit(r.txId());
                else p.abort(r.txId());
            }
            log.append(r.txId(), "DONE");
        }
    }
}

Participant with onPrepare / onCommit / onAbort

public class Participant {
    public enum State { IDLE, PREPARED, COMMITTED, ABORTED }
    private final ParticipantLog log;
    private final LockManager locks;
    private final TxExecutor executor;
    private final Map<String, State> txStates = new ConcurrentHashMap<>();

    public Participant(ParticipantLog log, LockManager locks, TxExecutor executor) {
        this.log = log; this.locks = locks; this.executor = executor;
    }

    public synchronized boolean onPrepare(String txId, List<Op> ops) {
        if (txStates.containsKey(txId)) return txStates.get(txId) == State.PREPARED;
        try {
            locks.acquireAll(txId, ops);
            executor.stage(txId, ops);      // materialise undo/redo records
            log.forceSync(txId, "PREPARED"); // durable before we vote YES
            txStates.put(txId, State.PREPARED);
            return true;
        } catch (Exception e) {
            locks.releaseAll(txId);
            log.append(txId, "ABORTED");
            txStates.put(txId, State.ABORTED);
            return false;
        }
    }

    public synchronized void onCommit(String txId) {
        if (txStates.get(txId) != State.PREPARED) return; // idempotent reapply
        log.forceSync(txId, "COMMIT");
        executor.applyStaged(txId);
        locks.releaseAll(txId);
        txStates.put(txId, State.COMMITTED);
    }

    public synchronized void onAbort(String txId) {
        log.append(txId, "ABORT");
        executor.discardStaged(txId);
        locks.releaseAll(txId);
        txStates.put(txId, State.ABORTED);
    }

    public State status(String txId) {
        return txStates.getOrDefault(txId, State.IDLE);
    }
}

Complexity

Messages per transaction: O(N) PREPARE + O(N) vote + O(N) decision + O(N) ack = 4N
Log forces per participant: 2 (PREPARED, COMMIT/ABORT)
Log forces on coordinator: 2 (BEGIN, decision)
Lock hold time: O(max participant Phase-1 latency)
Blocking window if coordinator fails: unbounded until coordinator recovery

Key design decisions & trade-offs

Atomicity model — Chosen: Durable vote before YES. A participant voting YES without forcing its log could crash and forget its promise, breaking atomicity. The forced write is the price of correctness.
Blocking behaviour — Chosen: Accept coordinator single point of failure. Pure 2PC blocks participants while the coordinator is down. Variants (3PC, Paxos Commit) solve this at the cost of more rounds and more complexity.
Cross-service scope — Chosen: Use 2PC only within a trust boundary. Running 2PC across services you do not operate couples failure domains. Sagas with compensating actions are preferred for long-running or cross-org workflows.

Common pitfalls

Voting YES without forcing the WAL, so a crashed participant forgets its promise and a later COMMIT finds no transaction
Holding locks across PREPARE for human-scale time, deadlocking throughput on the hottest rows
Running a single coordinator without replication, making the whole transaction system as available as one node
Timing out a prepared participant and aborting on the coordinator while the participant keeps locks waiting

Interview follow-ups

Replicate the coordinator log with Raft so a crashed coordinator can be failed over without blocking
Add presumed-abort optimisation to skip the ABORT log record on read-only or single-vote-NO paths
Move to Paxos Commit when you need non-blocking semantics under coordinator failure
Swap 2PC for sagas and idempotent compensations when participants span independent services or organisations