Rabin-Karp Visualization for Coding Interviews

By Rahul Kumar · Senior Software Engineer · Updated May 2026 · Category: Arrays & Strings

Rolling hash pattern search; hash collisions trigger char-by-char verify.

Concept

Rabin-Karp is the pattern-matching algorithm that trades determinism for elegance: it hashes the pattern once, then slides a rolling hash across the text so each length-m window's hash is computed in O(1) from the previous window. Windows whose hash differs cannot match, so you skip them outright; windows whose hash matches are verified character-by-character to guard against collisions. Expected time is O(n + m), worst case is O(nm) when adversarial input forces every window to collide. The real superpower is multi-pattern search: once you have a rolling hash, testing a text window against k patterns is a hashset lookup, giving O(n + total_m) expected, which is why plagiarism detectors and near-duplicate-document systems lean on it instead of KMP.

Rabin-Karp — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Rabin-Karp visualization — animated algorithm, step-through, and live data-structure updates.

How it works

Pick a base B (typically 31 or 131 for ASCII; a prime slightly above the alphabet size) and a modulus M (a large prime, e.g. 1_000_000_007) to keep the hash inside a 64-bit long. The pattern hash is h_p = (p[0] * B^(m-1) + p[1] * B^(m-2) + ... + p[m-1]) mod M, computed via Horner's rule. The first window's hash h_0 is computed the same way. To roll from window i to window i+1 you subtract the leading character's contribution (text[i] * B^(m-1)), multiply what remains by B, and add the new trailing character — all mod M. Because modular subtraction can go negative in Java, you add M before taking the final mod. When h_window == h_p you run a character-by-character equality check to confirm; on mismatch you keep rolling. Collision probability with one good prime is roughly m/M — small enough for most uses but not zero. For adversarial inputs (hash-flood DoS) use double hashing with two independent (B, M) pairs; the combined collision rate drops to m/(M1 * M2), which is negligible. The rolling step is the whole value proposition: without it you pay O(m) per window and you might as well run naive matching.

Implementation

RabinKarp.java

package dsa.rabinkarp;

import java.util.ArrayList;
import java.util.List;

public final class RabinKarp {
    private static final long BASE = 31L;
    private static final long MOD = 1_000_000_007L;

    // Returns all starting indices where pattern occurs in text.
    public static List<Integer> rabinKarpSearch(String text, String pattern) {
        List<Integer> hits = new ArrayList<>();
        int n = text.length(), m = pattern.length();
        if (m == 0 || m > n) return hits;

        long patternHash = 0L;
        long windowHash = 0L;
        long highestPow = 1L; // BASE^(m-1) mod MOD

        for (int i = 0; i < m - 1; i++) {
            highestPow = (highestPow * BASE) % MOD;
        }

        for (int i = 0; i < m; i++) {
            patternHash = (patternHash * BASE + pattern.charAt(i)) % MOD;
            windowHash = (windowHash * BASE + text.charAt(i)) % MOD;
        }

        for (int i = 0; i <= n - m; i++) {
            if (windowHash == patternHash && equalsAt(text, pattern, i)) {
                hits.add(i);
            }
            if (i < n - m) {
                long leading = (text.charAt(i) * highestPow) % MOD;
                windowHash = (windowHash - leading + MOD) % MOD;
                windowHash = (windowHash * BASE + text.charAt(i + m)) % MOD;
            }
        }
        return hits;
    }

    private static boolean equalsAt(String text, String pattern, int start) {
        for (int j = 0; j < pattern.length(); j++) {
            if (text.charAt(start + j) != pattern.charAt(j)) return false;
        }
        return true;
    }
}

Complexity

Time: O(n + m) expected, O(nm) worst case on full collision
Space: O(1) extra beyond the hit list

Common pitfalls

Forgetting to add MOD before the subtraction when rolling the hash, which lets the value go negative in Java
Using BASE smaller than the alphabet size — two distinct chars can produce the same partial hash
Picking a small MOD (e.g. Integer.MAX_VALUE) and then multiplying without casting to long — silent overflow
Skipping the equality verification on hash match — collisions do happen and will corrupt results
Recomputing BASE^(m-1) inside the loop instead of caching it; turns O(n) into O(n log m)

Key design decisions & trade-offs

Modulus choice — Chosen: Single 1e9+7 vs double hashing with two primes. Single prime is fast and fine for trusted input; double hashing eliminates collision worries against adversarial or very long texts at roughly 2x cost.
Base choice — Chosen: 31 for ASCII vs 131 or 257 for larger alphabets. The base should exceed the alphabet to avoid structural collisions. 31 is idiomatic for lowercase English (it's what Java String.hashCode uses).
When to use Rabin-Karp — Chosen: Multi-pattern search vs single-pattern. For a single pattern, KMP or Z-algorithm is strictly better worst-case. Rabin-Karp dominates when you're searching for many patterns in one text via a hashset of pattern hashes.

Practice problems

Repeated DNA Sequences

Interview follow-ups

Implement double hashing and show that the expected number of false positives drops from m/M to m/(M1*M2)
Extend to 2D Rabin-Karp to find a rectangular pattern inside a larger grid
Use Rabin-Karp to detect near-duplicate documents via shingled hashes — how is this related to MinHash?
Construct an adversarial input that forces worst-case O(nm) against a known (base, mod) pair
Compare Rabin-Karp's constant factor against KMP on modern CPUs — when does the branch-free rolling hash actually win?

Why it matters

Rabin-Karp is a core part of the Arrays & Strings toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".