Consistent Hashing Visualization for Coding Interviews
Ring + virtual nodes; add/remove a node moves only 1/N keys.
Concept
Consistent hashing is the technique that makes distributed caches, sharded databases, and object stores survive the addition or removal of nodes without reshuffling the entire keyspace. Plain hash(key) % N works until N changes, at which point nearly every key maps to a different node and you effectively invalidate the entire cluster. Consistent hashing breaks this coupling by mapping both keys and nodes onto the same abstract ring, usually a 64-bit hash space, and assigning each key to the first node encountered while walking clockwise from the key's position. Add a new node and only keys that fall into that one arc move; remove a node and only its arc gets redistributed to its successor. Virtual nodes, where each physical node is placed at many positions on the ring, fix the load-balance issue that a small number of ring positions would otherwise produce, and together these two ideas are the foundation of Dynamo, Cassandra, memcached clients, and every cache-fleet hashing library worth using.
How it works
Represent the ring as a TreeMap<Long, Node> keyed by 64-bit hash values. When you addNode(node), hash the node's identifier concatenated with a replica index, once per virtual replica, and insert each hash into the map pointing at the same node. Typical replica counts are 100 to 500 per physical node, which smooths out the ring and bounds load skew. When you getNode(key), hash the key to a long and call TreeMap.ceilingEntry(hash). If no entry has a hash greater than or equal to the key's hash, wrap around by returning firstEntry(). When you removeNode(node), walk the ring and delete every entry whose value is that node; the successor of each deleted arc becomes responsible for those keys automatically. The hash function must be well-distributed, typically Murmur3 or SHA-1 truncated to 64 bits, otherwise the ring clumps and load becomes uneven regardless of replica count. The TreeMap gives you O(log V) lookup where V is the number of virtual nodes, which is effectively constant at cluster scale, and O(V/N) keys move on a node add or remove instead of O((N-1)/N) under modulo hashing.
Implementation
import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Map;
import java.util.TreeMap;
public class ConsistentHashRing<N> {
private final int replicasPerNode;
private final TreeMap<Long, N> ring = new TreeMap<>();
public ConsistentHashRing(int replicasPerNode) {
if (replicasPerNode <= 0) throw new IllegalArgumentException("replicas > 0");
this.replicasPerNode = replicasPerNode;
}
/** Place replicasPerNode virtual copies of `node` on the ring. */
public void addNode(N node) {
String id = node.toString();
for (int i = 0; i < replicasPerNode; i++) {
long h = hash(id + "#" + i);
ring.put(h, node);
}
}
/** Remove every virtual copy of `node` from the ring. */
public void removeNode(N node) {
String id = node.toString();
for (int i = 0; i < replicasPerNode; i++) {
ring.remove(hash(id + "#" + i));
}
}
/** Return the node responsible for `key`, or null if the ring is empty. */
public N getNode(Object key) {
if (ring.isEmpty()) return null;
long h = hash(key.toString());
Map.Entry<Long, N> entry = ring.ceilingEntry(h);
if (entry == null) entry = ring.firstEntry(); // wrap clockwise
return entry.getValue();
}
public int ringSize() { return ring.size(); }
/** 64-bit hash derived from the first 8 bytes of SHA-1. Swap in Murmur3 for speed. */
private static long hash(String s) {
try {
MessageDigest md = MessageDigest.getInstance("SHA-1");
byte[] d = md.digest(s.getBytes(StandardCharsets.UTF_8));
long h = 0;
for (int i = 0; i < 8; i++) h = (h << 8) | (d[i] & 0xffL);
return h;
} catch (NoSuchAlgorithmException e) {
throw new IllegalStateException(e);
}
}
}
Complexity
- Time:
O(log V) per lookup, O(V) to add or remove a node where V is replicas per node - Space:
O(V * N) for the ring across N physical nodes
Common pitfalls
- Too few replicas per node makes load wildly uneven; under 50 replicas you see noticeable skew
- Using a weak hash function like Java's String.hashCode produces clumps on the ring
- Forgetting to wrap around when ceilingEntry returns null; the largest key is otherwise unassigned
- Recomputing the full ring on every request instead of caching the TreeMap
- Assuming hot keys are balanced; consistent hashing balances across keys, not across key popularity
- Storing the ring in a non-thread-safe TreeMap when read/write happens across threads
Key design decisions & trade-offs
- Placement — Chosen: Virtual nodes via multiple ring positions per physical node. Without them, load variance is high even for well-distributed hashes; 100-500 replicas is the sweet spot
- Data structure — Chosen: TreeMap (red-black tree). O(log V) ceilingEntry is fast enough and the API maps cleanly onto the ring operation
- Hot-key handling — Chosen: Still hashes to one node. Consistent hashing does not solve popularity skew; pair with replica reads or power-of-two choices
Practice problems
Interview follow-ups
- Jump consistent hashing for constant-memory alternative to virtual nodes
- Rendezvous (HRW) hashing as a probe-based alternative
- Bounded-load consistent hashing to cap per-node imbalance
- Replicate keys to successor arcs for fault tolerance (Dynamo-style)
- Integrate with a gossip membership protocol for dynamic clusters
Why it matters
Consistent Hashing is a core part of the Hashing toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".