CAP under Partition

By Rahul Kumar · Senior Software Engineer · Updated May 2026 · Category: System Design Primer · Unique Topics

3-node cluster, partition + heal. CP rejects minority writes; AP diverges.

This interactive explanation is built for system design interview prep: step through CAP under Partition, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

CAP is the theorem that forces every distributed system to pick a side during a network partition. Eric Brewer's 2000 conjecture, formalized by Gilbert and Lynch in 2002, says a system cannot simultaneously guarantee Consistency (every read sees the most recent write), Availability (every request gets a non-error response), and Partition tolerance (the system keeps working when messages between nodes are dropped). Because partitions are a fact of physics, not a design choice, real systems get to pick only between CP (refuse requests on the minority side) and AP (accept writes on both sides, reconcile later). The theorem is often misunderstood as a three-way trade-off; it is really a binary choice made at partition time. When the network is healthy, both CP and AP systems can be consistent and available. CAP only matters once packets start dropping. Understanding CAP is the difference between picking a database that loses data and picking one that refuses to accept it.

CAP under Partition — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive CAP under Partition widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

Every distributed write needs to reach a quorum of replicas to be considered durable. When the network splits the cluster into two groups that cannot see each other, one of two things must happen to each partition. A CP system (ZooKeeper, etcd, HBase) detects the partition and allows only the majority side to accept writes; the minority side returns errors. This preserves linearizability — every successful write is visible to every subsequent successful read — at the cost of rejecting requests routed to the minority. The detection mechanism is usually a lease or a Raft leader whose heartbeat times out. An AP system (Cassandra, DynamoDB, Riak) allows both sides to accept writes, tagging them with timestamps or vector clocks. When the partition heals, the system reconciles divergent writes using last-write-wins, merge functions, or CRDTs. Every request gets an answer, but two clients reading the same key during the partition may see different values, and reconciliation can drop writes silently if two updates collide. The PACELC extension captures the practical reality: in the absence of a partition (Else), systems also trade Latency for Consistency. A strongly consistent read across three regions pays three cross-region RTTs; an eventually consistent read pays one. Most production systems are tunable — Cassandra lets you pick QUORUM or ONE per request — so the CAP label is really a default, not a prison.

Implementation

CPStrategy: majority-only writes, reject on partition

public class CPStrategy implements KVStore {
    private final List<Replica> replicas;
    private final int quorum;
    private final PartitionDetector detector;

    public CPStrategy(List<Replica> replicas, PartitionDetector detector) {
        this.replicas = replicas;
        this.quorum = replicas.size() / 2 + 1;
        this.detector = detector;
    }

    @Override
    public void put(String key, byte[] value) {
        if (!detector.hasMajorityQuorum()) {
            throw new UnavailableException("Refusing write on minority partition");
        }
        int acks = 0;
        for (Replica r : replicas) {
            try { r.write(key, value); acks++; }
            catch (IOException ignored) {}
            if (acks >= quorum) return;
        }
        throw new UnavailableException("Quorum not reached, acks=" + acks);
    }

    @Override
    public byte[] get(String key) {
        if (!detector.hasMajorityQuorum())
            throw new UnavailableException("Refusing read on minority partition");
        return replicas.get(0).read(key); // leader-only read
    }
}

APStrategy: accept on any reachable replica, reconcile later

public class APStrategy implements KVStore {
    private final List<Replica> replicas;
    private final ConflictResolver resolver; // LWW or CRDT merge

    public APStrategy(List<Replica> replicas, ConflictResolver resolver) {
        this.replicas = replicas; this.resolver = resolver;
    }

    @Override
    public void put(String key, byte[] value) {
        long ts = System.currentTimeMillis();
        for (Replica r : replicas) {
            try { r.writeTimestamped(key, value, ts); return; }
            catch (IOException e) { /* try next */ }
        }
        throw new UnavailableException("No replica reachable");
    }

    @Override
    public byte[] get(String key) {
        List<TimestampedValue> seen = new ArrayList<>();
        for (Replica r : replicas) {
            try { seen.add(r.readTimestamped(key)); }
            catch (IOException ignored) {}
            if (!seen.isEmpty()) break; // serve first reachable
        }
        if (seen.isEmpty()) throw new UnavailableException("All replicas down");
        return resolver.merge(seen).value();
    }
}

Complexity

CP write (healthy): 1 cross-node RTT + majority acks
CP write (partitioned minority): error, O(1)
AP write: 1 RTT to any reachable replica
AP read (conflict): N RTTs + merge cost

Key design decisions & trade-offs

CP vs AP default — Chosen: CP for systems of record, AP for user-facing reads. Billing and inventory must refuse uncertain writes; feeds and timelines must stay responsive even during partition.
Quorum size — Chosen: Majority (N/2 + 1). Smallest quorum that guarantees no two disjoint majorities exist simultaneously.
Conflict resolution — Chosen: CRDT over LWW when possible. LWW silently drops concurrent writes; CRDTs preserve both by design at the cost of larger payloads.

Common pitfalls

Assuming a single datacenter is partition-free: rack-level splits happen routinely
Using LWW with unsynchronized clocks — a wrong clock rewrites history
Confusing 'eventual consistency' with 'strong eventual consistency'; only the latter guarantees convergence
Forgetting that read-your-writes requires session stickiness or read quorums in AP systems

Interview follow-ups

Tune read/write quorums (R + W > N) for stronger consistency on AP stores
Add hinted handoff for transient partitions
Model PACELC explicitly: latency-vs-consistency in the non-partition case
Introduce read-your-writes via session tokens