← System Design Simulator

Replication Anomalies

By Rahul Kumar · Senior Software Engineer · Updated · Category: Kleppmann · Designing Data-Intensive Applications

Read-your-writes, monotonic, consistent prefix under lag. Ch 5.

This interactive explanation is built for system design interview prep: step through Replication Anomalies, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

Replication is what gives a distributed database its durability and read throughput, but the moment you allow a client to read from any replica you inherit a catalogue of anomalies that do not exist on a single node. The four canonical ones Kleppmann enumerates are read-your-writes violations (you write, you immediately read, and see your old value), monotonic-read violations (successive reads go backwards in time as you hop replicas), consistent-prefix violations (you see effects before causes), and stale reads (you see data committed minutes ago on another replica). These anomalies are not exotic: they happen on every asynchronously replicated system, from MySQL primary-replica to geo-distributed Cassandra. The bug rarely shows up in testing because local latency hides it; it shows up in production when a user posts a comment, refreshes the page, and sees their own comment vanish. The fix is always a form of session consistency: pin reads to a replica that has seen your writes.

Replication Anomalies — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Replication Anomalies widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

In a typical leader-follower setup, writes land on the leader and propagate asynchronously to N followers. Replication lag varies per follower: one might be 10 ms behind, another might be 5 seconds behind after a GC pause. A client whose requests are load-balanced round-robin across followers will hit different lags on successive reads, producing non-monotonic behaviour. Monotonic-reads says: once you have seen a version at timestamp T, you should never again see a version earlier than T. The standard fix is session stickiness — route all reads from one session to one replica, usually by hashing the user ID to a follower. Read-your-writes is stronger: after a successful write, reads in that session must observe that write. Solutions include routing reads for a short window to the leader, passing a write-token (the last observed LSN) with the read and having the follower wait until it has replicated past it, or always reading from the leader for the user's own timeline. Consistent-prefix matters when writes are causally ordered (comment A replies to comment B): if replicas apply writes in different orders, readers can see A without B. Vector clocks or logical timestamps can be attached to writes so followers apply them in a valid causal order. None of these fixes give linearizability; they give session-scoped guarantees at a fraction of the cost.

Implementation

ReplicaClient: demonstrates monotonic-reads breaking across replicas
import java.util.*;
import java.util.concurrent.ThreadLocalRandom;

/** Three replicas with different lags behind the leader. */
public class ReplicaClient {
    static final class Replica {
        final String name;
        final long lagMs;              // how far behind the leader
        final NavigableMap<Long, String> log = new TreeMap<>(); // applyTime -> value
        Replica(String name, long lagMs) { this.name = name; this.lagMs = lagMs; }

        void applyFromLeader(long leaderWriteTime, String value) {
            log.put(leaderWriteTime + lagMs, value);
        }

        /** Value visible to a reader at wallClockNow. */
        String read(long wallClockNow) {
            Map.Entry<Long, String> e = log.floorEntry(wallClockNow);
            return e == null ? null : e.getValue();
        }
    }

    private final Replica leader = new Replica("leader", 0);
    private final List<Replica> followers = List.of(
        new Replica("follower-a", 200),
        new Replica("follower-b", 1500),   // slow follower — big lag
        new Replica("follower-c", 400)
    );

    public void write(String value, long now) {
        leader.log.put(now, value);
        for (Replica f : followers) f.applyFromLeader(now, value);
    }

    /** Bug: round-robin reads across followers with different lags. */
    public String readRoundRobin(long now) {
        Replica f = followers.get(ThreadLocalRandom.current().nextInt(followers.size()));
        return f.read(now);
    }

    public static void demoAnomaly() {
        ReplicaClient c = new ReplicaClient();
        long t0 = 0;
        c.write("v1", t0);
        c.write("v2", t0 + 1000);
        // At t=t0+1800, follower-a and follower-c have v2, but follower-b still shows v1.
        // Successive round-robin reads can return v2 then v1 then v2 — non-monotonic.
        for (int i = 0; i < 5; i++) {
            System.out.println("read -> " + c.readRoundRobin(1800));
        }
    }
}
Session stickiness: pin reads to one replica per user
import java.util.*;

/** Routes each session to a deterministic replica, giving monotonic reads for free. */
public class StickyReplicaRouter {
    private final List<ReplicaClient.Replica> replicas;
    private final ReplicaClient.Replica leader;

    public StickyReplicaRouter(ReplicaClient.Replica leader,
                               List<ReplicaClient.Replica> followers) {
        this.leader = leader;
        this.replicas = List.copyOf(followers);
    }

    /** Consistent hash: same userId -> same replica every time. */
    private ReplicaClient.Replica pick(String userId) {
        int h = Math.floorMod(userId.hashCode(), replicas.size());
        return replicas.get(h);
    }

    /** Monotonic reads: one user, one replica. */
    public String read(String userId, long now) {
        return pick(userId).read(now);
    }

    /** Read-your-writes: for a short window after a write, read from leader. */
    public String readYourWrites(String userId, long lastWriteTime, long now) {
        long freshnessWindowMs = 2_000;
        if (now - lastWriteTime < freshnessWindowMs) {
            return leader.read(now);
        }
        return pick(userId).read(now);
    }

    /** Bounded-staleness variant: wait until chosen replica has caught up past LSN. */
    public String readAfterLsn(String userId, long requiredLsn, long now) {
        ReplicaClient.Replica r = pick(userId);
        // In real code we would poll/notify until r.appliedLsn >= requiredLsn;
        // if the replica is too far behind, fall back to the leader.
        long appliedLsn = r.log.isEmpty() ? -1 : r.log.lastKey();
        return appliedLsn >= requiredLsn ? r.read(now) : leader.read(now);
    }
}

Complexity

Key design decisions & trade-offs

Common pitfalls

Interview follow-ups

Recommended reading

Related