KMP String Search Visualization for Coding Interviews

By Rahul Kumar · Senior Software Engineer · Updated May 2026 · Category: Arrays & Strings

Failure function + match walk; never re-scans text.

Concept

KMP (Knuth-Morris-Pratt) is the first linear-time exact-string-match algorithm you learn that actually beats the naive O(nm) scan by never re-examining a text character. The insight is that when a mismatch happens after matching a prefix of the pattern, the prefix itself tells you exactly how far you can safely shift the pattern without missing a match — that information lives in the pattern's failure function (a.k.a. LPS, longest proper prefix that is also a suffix). Build the failure function in O(m), then sweep the text in O(n), and you get O(n + m) total with O(m) extra memory. It is the baseline for problems that ask for a single pattern inside a long text when you cannot preprocess the text, and it shines when the pattern has internal repetition that fools naive shifts.

KMP String Search — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive KMP String Search visualization — animated algorithm, step-through, and live data-structure updates.

How it works

The failure function f[i] stores the length of the longest proper prefix of pattern[0..i] that is also a suffix of pattern[0..i]. It is built with a two-pointer loop: a slow index k tracks the current prefix length and a fast index i walks the pattern. If pattern[i] == pattern[k], we extend by one and record f[i] = k+1. If they differ and k > 0, we fall back to k = f[k-1] and retry without advancing i — because f[k-1] is the next-best prefix-suffix candidate. If k is already 0, we record f[i] = 0 and move on. Each iteration either advances i or shrinks k, and k never exceeds i, so the total work is O(m) amortized. The search loop mirrors that structure against the text: maintain j = 'characters of pattern matched so far', advance i through text, on match bump both, on mismatch fall back via j = f[j-1] without rewinding i. When j reaches m, you have a match starting at i - m; either return it (first match) or record it and set j = f[j-1] to continue finding overlapping matches. Because i is monotonic and the fallback chain per character is amortized O(1), the text scan is strict O(n).

Implementation

KMP.java

package dsa.kmp;

public final class KMP {

    // Builds failure function (LPS array) of pattern.
    public static int[] computeFailureFunction(String pattern) {
        int m = pattern.length();
        int[] f = new int[m];
        int k = 0;
        for (int i = 1; i < m; i++) {
            while (k > 0 && pattern.charAt(i) != pattern.charAt(k)) {
                k = f[k - 1];
            }
            if (pattern.charAt(i) == pattern.charAt(k)) {
                k++;
            }
            f[i] = k;
        }
        return f;
    }

    // Returns first index in text where pattern occurs, or -1.
    public static int kmpSearch(String text, String pattern) {
        if (pattern.isEmpty()) return 0;
        int n = text.length(), m = pattern.length();
        if (m > n) return -1;
        int[] f = computeFailureFunction(pattern);
        int j = 0;
        for (int i = 0; i < n; i++) {
            while (j > 0 && text.charAt(i) != pattern.charAt(j)) {
                j = f[j - 1];
            }
            if (text.charAt(i) == pattern.charAt(j)) {
                j++;
            }
            if (j == m) {
                return i - m + 1;
            }
        }
        return -1;
    }
}

Complexity

Time: O(n + m) for build + search
Space: O(m) for the failure array
Preprocessing: O(m) to build failure function once, reusable across texts

Common pitfalls

Confusing 'longest prefix that is also suffix' with 'longest proper prefix' — f[i] must exclude the full string itself
Falling back with k = f[k] instead of k = f[k-1], which double-counts and breaks the invariant
Rewinding the text index i on mismatch — KMP never rewinds text, only the pattern pointer
Off-by-one when reporting the match index: it is i - m + 1 after the final character match, not i
Forgetting to handle the empty-pattern case (by convention, return 0)

Key design decisions & trade-offs

Algorithm choice for single-pattern search — Chosen: KMP vs Boyer-Moore vs Rabin-Karp. KMP is deterministic O(n+m) with tiny constants; Boyer-Moore is faster in practice on large alphabets but worst-case O(nm); Rabin-Karp wins only for multi-pattern search via hashed sets.
Failure-function variant — Chosen: Plain LPS vs 'strong' failure with character-specific transitions. Strong failure (Aho-Corasick goto) removes the inner while by precomputing a DFA — faster at run time but O(m * alphabet) memory. Plain LPS is enough for single-pattern.
Match semantics — Chosen: First match vs all matches. Returning on first match short-circuits early; for all matches you set j = f[j-1] after recording and continue, which handles overlapping occurrences naturally.

Practice problems

Implement strStr()

Interview follow-ups

Extend to count all overlapping occurrences of pattern in text without restarting
Explain why the amortized fallback chain per text character is O(1) — give a potential-function argument
Generalize KMP to Aho-Corasick for multi-pattern search and describe the failure-link construction
Use KMP to find the smallest period of a string and prove it is m - f[m-1] when that divides m
How does KMP perform against Z-algorithm for the same problem, and which is easier to get right in an interview?

Why it matters

KMP String Search is a core part of the Arrays & Strings toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".