Hash Table Visualization for Coding Interviews
Chaining vs open addressing; animated collision handling, load factor, resize.
Concept
A hash table is the data structure that makes most other data structures feel slow. It promises average O(1) lookup, insert, and delete by converting a key into a bucket index via a hash function, then storing values in buckets. The problem is that two keys can hash to the same bucket, so every hash table has to pick a collision policy, a load factor threshold, and a growth strategy, and those choices determine whether it stays fast in the worst case. The two dominant implementations are separate chaining, where each bucket is a linked list or a small tree of entries, and open addressing, where collisions probe to neighboring slots. Java's HashMap is a chaining hybrid that upgrades a bucket to a red-black tree once it has eight entries. Python's dict is open addressing. Both ship the same contract: amortized O(1) with proper sizing, O(N) worst case if the hash is pathological.
How it works
Start with an array of buckets sized to a power of two so that index = hash(key) & (capacity - 1) is a cheap bitmask instead of a modulo. On put(key, value), compute a hash, find the bucket, and either append to the bucket's chain if the key is new, or overwrite the existing entry if the key matches. On get(key), hash and walk the chain comparing with equals(). On remove(key), hash, walk, and unlink. The load factor is the ratio of stored entries to bucket count; once it exceeds a threshold, classically 0.75, trigger a resize: allocate a new bucket array twice the size and rehash every entry into it. Rehashing is O(N) but happens only every N inserts on average, so amortized cost per insert is still O(1). With open addressing you instead store entries directly in the array and probe linearly, quadratically, or with double hashing on collision, using a tombstone sentinel to mark deleted slots so lookups do not stop at a hole. Both designs rely on a good hash function: if hashes cluster, chains grow long or probes degenerate, and O(1) becomes O(N). Java defends against this with tree-bucket upgrades; Robin Hood hashing addresses it by redistributing probe distances.
Implementation
import java.util.Objects;
public class SimpleHashMap<K, V> {
private static class Entry<K, V> {
final int hash; final K key; V val; Entry<K, V> next;
Entry(int h, K k, V v, Entry<K, V> n) { hash = h; key = k; val = v; next = n; }
}
private static final int DEFAULT_CAPACITY = 16;
private static final double LOAD_FACTOR = 0.75;
private Entry<K, V>[] table;
private int size;
@SuppressWarnings("unchecked")
public SimpleHashMap() { this.table = (Entry<K, V>[]) new Entry[DEFAULT_CAPACITY]; }
private static int spread(Object key) {
int h = (key == null) ? 0 : key.hashCode();
return h ^ (h >>> 16); // mix high bits into low, like java.util.HashMap
}
private int indexFor(int hash) { return hash & (table.length - 1); }
public V get(K key) {
int h = spread(key);
for (Entry<K, V> e = table[indexFor(h)]; e != null; e = e.next) {
if (e.hash == h && Objects.equals(e.key, key)) return e.val;
}
return null;
}
public V put(K key, V val) {
int h = spread(key);
int i = indexFor(h);
for (Entry<K, V> e = table[i]; e != null; e = e.next) {
if (e.hash == h && Objects.equals(e.key, key)) { V old = e.val; e.val = val; return old; }
}
table[i] = new Entry<>(h, key, val, table[i]);
if (++size > LOAD_FACTOR * table.length) rehash();
return null;
}
public V remove(K key) {
int h = spread(key);
int i = indexFor(h);
Entry<K, V> prev = null;
for (Entry<K, V> e = table[i]; e != null; prev = e, e = e.next) {
if (e.hash == h && Objects.equals(e.key, key)) {
if (prev == null) table[i] = e.next; else prev.next = e.next;
size--;
return e.val;
}
}
return null;
}
public int size() { return size; }
@SuppressWarnings("unchecked")
private void rehash() {
Entry<K, V>[] old = table;
table = (Entry<K, V>[]) new Entry[old.length << 1];
for (Entry<K, V> head : old) {
for (Entry<K, V> e = head; e != null; ) {
Entry<K, V> next = e.next;
int i = indexFor(e.hash);
e.next = table[i];
table[i] = e;
e = next;
}
}
}
}
Complexity
- Time:
O(1) average for get/put/remove, O(N) worst case under hash collisions - Space:
O(N) for entries plus bucket array
Common pitfalls
- Using a bad hashCode so everything collides into one chain, degrading to O(N)
- Forgetting to mix high bits into low; powers-of-two indexing then only sees the low bits
- Mutating a key after inserting it: its hash changes and it becomes unreachable
- Not rehashing on growth, causing load factor to climb and chains to lengthen
- In open addressing, failing to use tombstones on delete breaks future lookups
- Iterating and modifying simultaneously without fail-fast detection
- Assuming thread safety; java.util.HashMap corrupts under concurrent writes
Key design decisions & trade-offs
- Collision strategy — Chosen: Separate chaining. Tolerates high load factors and resists clustering better than linear probing; allows tree upgrade for pathological buckets
- Capacity — Chosen: Power-of-two bucket array. Index becomes a bitmask instead of modulo; doubling on rehash keeps capacity a power of two
- Load factor — Chosen: 0.75. Industry-standard balance between space and chain length; lower wastes memory, higher degrades lookups
Practice problems
Interview follow-ups
- Open-addressing variant with linear probing and tombstones
- Robin Hood hashing to balance probe distances
- Consistent hashing for distributed key assignment
- Concurrent hash map with segment locks or CAS
- Count-Min Sketch for approximate frequency with fixed memory
Why it matters
Hash Table is a core part of the Hashing toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".