Bloom Filter Visualization for Coding Interviews

By Rahul Kumar · Senior Software Engineer · Updated May 2026 · Category: Hashing

k hashes → m bits; zero false negatives, tunable false-positive rate.

Concept

A Bloom filter is a probabilistic set that trades perfect membership for dramatic memory savings. It can tell you with certainty that a key is not in the set, and it can tell you with controlled probability that a key might be in the set, but it never produces false negatives. That one-sided error profile is exactly what you want for cache miss short-circuits, LSM-tree SSTable index reads, malicious URL filters, and distributed join pre-filters: a negative response is cheap and correct, a positive response requires the slower confirmation lookup. A typical Bloom filter uses a bit array of size m and k independent hash functions, storing no actual keys. For the right tuning of m and k relative to n (the expected number of inserted elements), you can hit a 1% false positive rate at around 10 bits per key, which is one to two orders of magnitude less memory than storing the keys themselves.

Bloom Filter — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Bloom Filter visualization — animated algorithm, step-through, and live data-structure updates.

How it works

Allocate a bit array of m bits, all zero. Pick k hash functions that map any key uniformly over [0, m). On add(key), hash the key k times and set all k resulting bit positions to 1. On mightContain(key), hash the key k times and return true only if all k bits are set; if any bit is zero, the key was definitely never added. False negatives are impossible because add can only set bits, never clear them. False positives occur when enough other keys collectively happen to set the k bits a given key hashes to, and the probability of that is approximately (1 - e^(-kn/m))^k for n inserted items. This has a minimum at k = (m/n) ln 2, which tells you the ideal number of hash functions for a given memory budget. In practice you rarely run k independent hash functions; the Kirsch-Mitzenmacher double-hashing trick derives k hashes from two 64-bit hashes as h1 + i*h2 for i = 0..k-1, which is indistinguishable from independent hashes at this granularity and far faster. Bloom filters do not support delete directly because clearing bits could create false negatives for other keys; counting Bloom filters or cuckoo filters are the usual answer when deletes are required.

Implementation

BloomFilter with m bits, k hashes, and falsePositiveRate

import java.util.BitSet;

public class BloomFilter<T> {
    private final BitSet bits;
    private final int m;       // number of bits
    private final int k;       // number of hash functions
    private int insertedCount; // for fpr estimation

    /** Construct a filter sized for expectedInsertions at target falsePositiveRate. */
    public BloomFilter(int expectedInsertions, double fpr) {
        if (expectedInsertions <= 0 || fpr <= 0 || fpr >= 1)
            throw new IllegalArgumentException("bad params");
        this.m = Math.max(64, (int) Math.ceil(-expectedInsertions * Math.log(fpr) / (Math.log(2) * Math.log(2))));
        this.k = Math.max(1, (int) Math.round((double) m / expectedInsertions * Math.log(2)));
        this.bits = new BitSet(m);
    }

    public void add(T key) {
        long[] h = hash128(key);
        for (int i = 0; i < k; i++) {
            int idx = Math.floorMod(h[0] + (long) i * h[1], m);
            bits.set(idx);
        }
        insertedCount++;
    }

    public boolean mightContain(T key) {
        long[] h = hash128(key);
        for (int i = 0; i < k; i++) {
            int idx = Math.floorMod(h[0] + (long) i * h[1], m);
            if (!bits.get(idx)) return false;
        }
        return true;
    }

    /** Analytic false positive rate for n inserted items. */
    public double falsePositiveRate(int n) {
        return Math.pow(1 - Math.exp(-((double) k * n) / m), k);
    }

    public int bitSize() { return m; }
    public int hashCount() { return k; }
    public int insertedCount() { return insertedCount; }

    /**
     * Produce two independent 64-bit hashes from the object's hashCode and its
     * bit-reversed mix. For production use swap in Murmur3 or xxHash.
     */
    private static long[] hash128(Object key) {
        long h = (key == null) ? 0L : key.hashCode();
        long h1 = mix(h);
        long h2 = mix(h ^ 0x9E3779B97F4A7C15L);
        return new long[]{h1, h2 == 0 ? 1 : h2};
    }

    private static long mix(long x) {
        x ^= x >>> 33; x *= 0xff51afd7ed558ccdL;
        x ^= x >>> 33; x *= 0xc4ceb9fe1a85ec53L;
        x ^= x >>> 33; return x;
    }
}

Complexity

Time: O(k) per add and mightContain, independent of n
Space: O(m) bits, typically about 10 bits per key for 1% fpr

Common pitfalls

Using only one hash function: false positive rate explodes as n grows
Undersizing m for expected n: even the best k cannot hit the target fpr
Attempting delete by clearing bits: introduces false negatives for other keys
Using weak hash functions that correlate across k applications, creating clustering
Assuming serialized Bloom filters are portable across hash-function changes
Ignoring saturation: once most bits are set, mightContain approaches always-true

Key design decisions & trade-offs

Error profile — Chosen: False positives allowed, false negatives never. Fits cache/pre-filter use cases where a miss is authoritative and a hit is verified downstream
Hash derivation — Chosen: Double hashing via h1 + i*h2. Kirsch-Mitzenmacher shows this matches independent hashes statistically at a fraction of the CPU cost
Delete support — Chosen: Not supported. Use a counting Bloom filter or cuckoo filter if deletes are required; plain Bloom keeps the memory minimal

Practice problems

Bloom Filter (design discussion)

Interview follow-ups

Counting Bloom filter with small counters instead of bits to allow deletion
Scalable Bloom filter that grows dynamically as inserts accumulate
Cuckoo filter with better space efficiency and support for deletes
HyperLogLog for cardinality estimation (companion probabilistic structure)
Bloom filter cascade used in TLS CRLite for revocation checks

Why it matters

Bloom Filter is a core part of the Hashing toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".