← System Design Simulator

Bully Leader Election

By Rahul Kumar · Senior Software Engineer · Updated · Category: Petrov · Distributed Systems (Part II)

Highest-id wins; ELECTION/OK/COORDINATOR. Ch 10.

This interactive explanation is built for system design interview prep: step through Bully Leader Election, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

The Bully algorithm is the simplest leader election protocol that actually works in an asynchronous, crash-prone cluster, and it is still the one most engineers reach for when they do not need the full machinery of Raft or Paxos. The idea is embarrassingly direct: every node has a fixed, globally-unique priority (usually its node id), and the highest-surviving priority wins. When a node notices the current coordinator is gone, it challenges every higher-id node with an ELECTION message. If any of them respond, it loses the round and waits. If none respond, it declares itself COORDINATOR and tells everyone. The protocol was introduced by Garcia-Molina in 1982 and still underlies many production systems: classic MySQL replica promotion, ZooKeeper-less setups, and countless ad-hoc service coordinators. It converges in at most O(N^2) messages and, crucially, always elects the live node with the highest id, which makes reasoning about split-brain resolution easy.

Bully Leader Election — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Bully Leader Election widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

Every node is configured with the full membership list and each peer's priority. A failure detector (timeout-based or phi-accrual) marks the current coordinator as suspect. When node P decides the coordinator is down, P enters an election by sending an ELECTION message to every node with a higher id than itself. Three outcomes are possible. First, no higher node responds within the election timeout; P concludes it is the highest surviving node and broadcasts COORDINATOR to everyone, which becomes the new leader announcement. Second, at least one higher node responds with an OK message, which acknowledges receipt and implicitly says "stand down, I will take over." P then waits for a new COORDINATOR message within a secondary timeout. Third, that secondary timeout expires without a COORDINATOR announcement (the higher node crashed mid-election), in which case P restarts the election from scratch. When a previously-dead node with a high id rejoins, it must immediately start its own election; the protocol guarantees it will bully its way back into the leader slot, which is the behaviour that gives it the name. Message types are kept explicit and stateless: every handler inspects sender id against local id and either acknowledges, takes over, or ignores. Because the protocol relies entirely on timeouts, correct tuning of election timeout and heartbeat interval is load-bearing. Under network partition it can elect two leaders (one per side), so applications pair it with a quorum check or a fencing token before allowing the new leader to make externally-visible decisions.

Implementation

Bully election Node with explicit message handlers
public class Node {
    public enum MsgType { ELECTION, OK, COORDINATOR }
    public record Message(MsgType type, int fromId) {}

    private final int id;
    private final List<Integer> peerIds; // all other node ids, sorted
    private final Transport transport;
    private final ScheduledExecutorService timers;
    private volatile int currentLeader = -1;
    private volatile boolean waitingForOk = false;
    private volatile boolean waitingForCoordinator = false;
    private static final long ELECTION_TIMEOUT_MS = 500;

    public Node(int id, List<Integer> peerIds, Transport transport,
                ScheduledExecutorService timers) {
        this.id = id; this.peerIds = peerIds;
        this.transport = transport; this.timers = timers;
    }

    /** Triggered by failure detector when the current leader is suspect. */
    public synchronized void startElection() {
        List<Integer> higher = peerIds.stream()
                .filter(p -> p > id).toList();
        if (higher.isEmpty()) { becomeCoordinator(); return; }
        waitingForOk = true;
        higher.forEach(p -> transport.send(p, new Message(MsgType.ELECTION, id)));
        timers.schedule(this::onOkTimeout, ELECTION_TIMEOUT_MS, TimeUnit.MILLISECONDS);
    }

    public synchronized void onElection(Message m) {
        // Acknowledge and take over if we outrank the sender.
        if (m.fromId() < id) {
            transport.send(m.fromId(), new Message(MsgType.OK, id));
            if (!waitingForOk && !waitingForCoordinator) startElection();
        }
    }

    public synchronized void onOk(Message m) {
        waitingForOk = false;
        waitingForCoordinator = true;
        timers.schedule(this::onCoordinatorTimeout,
                2 * ELECTION_TIMEOUT_MS, TimeUnit.MILLISECONDS);
    }

    public synchronized void onCoordinator(Message m) {
        currentLeader = m.fromId();
        waitingForOk = false;
        waitingForCoordinator = false;
    }

    private synchronized void onOkTimeout() {
        if (waitingForOk) becomeCoordinator();
    }

    private synchronized void onCoordinatorTimeout() {
        if (waitingForCoordinator) { waitingForCoordinator = false; startElection(); }
    }

    private void becomeCoordinator() {
        currentLeader = id;
        waitingForOk = false;
        peerIds.forEach(p -> transport.send(p, new Message(MsgType.COORDINATOR, id)));
    }
}

Complexity

Key design decisions & trade-offs

Common pitfalls

Interview follow-ups

Recommended reading

Related