← System Design Simulator

Bloom Filter Visualization for Coding Interviews

By Rahul Kumar · Senior Software Engineer · Updated · Category: Hashing

k hashes → m bits; zero false negatives, tunable false-positive rate.

Concept

A Bloom filter is a probabilistic set that trades perfect membership for dramatic memory savings. It can tell you with certainty that a key is not in the set, and it can tell you with controlled probability that a key might be in the set, but it never produces false negatives. That one-sided error profile is exactly what you want for cache miss short-circuits, LSM-tree SSTable index reads, malicious URL filters, and distributed join pre-filters: a negative response is cheap and correct, a positive response requires the slower confirmation lookup. A typical Bloom filter uses a bit array of size m and k independent hash functions, storing no actual keys. For the right tuning of m and k relative to n (the expected number of inserted elements), you can hit a 1% false positive rate at around 10 bits per key, which is one to two orders of magnitude less memory than storing the keys themselves.

Bloom Filter — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Bloom Filter visualization — animated algorithm, step-through, and live data-structure updates.

How it works

Allocate a bit array of m bits, all zero. Pick k hash functions that map any key uniformly over [0, m). On add(key), hash the key k times and set all k resulting bit positions to 1. On mightContain(key), hash the key k times and return true only if all k bits are set; if any bit is zero, the key was definitely never added. False negatives are impossible because add can only set bits, never clear them. False positives occur when enough other keys collectively happen to set the k bits a given key hashes to, and the probability of that is approximately (1 - e^(-kn/m))^k for n inserted items. This has a minimum at k = (m/n) ln 2, which tells you the ideal number of hash functions for a given memory budget. In practice you rarely run k independent hash functions; the Kirsch-Mitzenmacher double-hashing trick derives k hashes from two 64-bit hashes as h1 + i*h2 for i = 0..k-1, which is indistinguishable from independent hashes at this granularity and far faster. Bloom filters do not support delete directly because clearing bits could create false negatives for other keys; counting Bloom filters or cuckoo filters are the usual answer when deletes are required.

Implementation

BloomFilter with m bits, k hashes, and falsePositiveRate
import java.util.BitSet;

public class BloomFilter<T> {
    private final BitSet bits;
    private final int m;       // number of bits
    private final int k;       // number of hash functions
    private int insertedCount; // for fpr estimation

    /** Construct a filter sized for expectedInsertions at target falsePositiveRate. */
    public BloomFilter(int expectedInsertions, double fpr) {
        if (expectedInsertions <= 0 || fpr <= 0 || fpr >= 1)
            throw new IllegalArgumentException("bad params");
        this.m = Math.max(64, (int) Math.ceil(-expectedInsertions * Math.log(fpr) / (Math.log(2) * Math.log(2))));
        this.k = Math.max(1, (int) Math.round((double) m / expectedInsertions * Math.log(2)));
        this.bits = new BitSet(m);
    }

    public void add(T key) {
        long[] h = hash128(key);
        for (int i = 0; i < k; i++) {
            int idx = Math.floorMod(h[0] + (long) i * h[1], m);
            bits.set(idx);
        }
        insertedCount++;
    }

    public boolean mightContain(T key) {
        long[] h = hash128(key);
        for (int i = 0; i < k; i++) {
            int idx = Math.floorMod(h[0] + (long) i * h[1], m);
            if (!bits.get(idx)) return false;
        }
        return true;
    }

    /** Analytic false positive rate for n inserted items. */
    public double falsePositiveRate(int n) {
        return Math.pow(1 - Math.exp(-((double) k * n) / m), k);
    }

    public int bitSize() { return m; }
    public int hashCount() { return k; }
    public int insertedCount() { return insertedCount; }

    /**
     * Produce two independent 64-bit hashes from the object's hashCode and its
     * bit-reversed mix. For production use swap in Murmur3 or xxHash.
     */
    private static long[] hash128(Object key) {
        long h = (key == null) ? 0L : key.hashCode();
        long h1 = mix(h);
        long h2 = mix(h ^ 0x9E3779B97F4A7C15L);
        return new long[]{h1, h2 == 0 ? 1 : h2};
    }

    private static long mix(long x) {
        x ^= x >>> 33; x *= 0xff51afd7ed558ccdL;
        x ^= x >>> 33; x *= 0xc4ceb9fe1a85ec53L;
        x ^= x >>> 33; return x;
    }
}

Complexity

Common pitfalls

Key design decisions & trade-offs

Practice problems

Interview follow-ups

Why it matters

Bloom Filter is a core part of the Hashing toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".

Related