KMP String Search Visualization for Coding Interviews
Failure function + match walk; never re-scans text.
Concept
KMP (Knuth-Morris-Pratt) is the first linear-time exact-string-match algorithm you learn that actually beats the naive O(nm) scan by never re-examining a text character. The insight is that when a mismatch happens after matching a prefix of the pattern, the prefix itself tells you exactly how far you can safely shift the pattern without missing a match — that information lives in the pattern's failure function (a.k.a. LPS, longest proper prefix that is also a suffix). Build the failure function in O(m), then sweep the text in O(n), and you get O(n + m) total with O(m) extra memory. It is the baseline for problems that ask for a single pattern inside a long text when you cannot preprocess the text, and it shines when the pattern has internal repetition that fools naive shifts.
How it works
The failure function f[i] stores the length of the longest proper prefix of pattern[0..i] that is also a suffix of pattern[0..i]. It is built with a two-pointer loop: a slow index k tracks the current prefix length and a fast index i walks the pattern. If pattern[i] == pattern[k], we extend by one and record f[i] = k+1. If they differ and k > 0, we fall back to k = f[k-1] and retry without advancing i — because f[k-1] is the next-best prefix-suffix candidate. If k is already 0, we record f[i] = 0 and move on. Each iteration either advances i or shrinks k, and k never exceeds i, so the total work is O(m) amortized. The search loop mirrors that structure against the text: maintain j = 'characters of pattern matched so far', advance i through text, on match bump both, on mismatch fall back via j = f[j-1] without rewinding i. When j reaches m, you have a match starting at i - m; either return it (first match) or record it and set j = f[j-1] to continue finding overlapping matches. Because i is monotonic and the fallback chain per character is amortized O(1), the text scan is strict O(n).
Implementation
package dsa.kmp;
public final class KMP {
// Builds failure function (LPS array) of pattern.
public static int[] computeFailureFunction(String pattern) {
int m = pattern.length();
int[] f = new int[m];
int k = 0;
for (int i = 1; i < m; i++) {
while (k > 0 && pattern.charAt(i) != pattern.charAt(k)) {
k = f[k - 1];
}
if (pattern.charAt(i) == pattern.charAt(k)) {
k++;
}
f[i] = k;
}
return f;
}
// Returns first index in text where pattern occurs, or -1.
public static int kmpSearch(String text, String pattern) {
if (pattern.isEmpty()) return 0;
int n = text.length(), m = pattern.length();
if (m > n) return -1;
int[] f = computeFailureFunction(pattern);
int j = 0;
for (int i = 0; i < n; i++) {
while (j > 0 && text.charAt(i) != pattern.charAt(j)) {
j = f[j - 1];
}
if (text.charAt(i) == pattern.charAt(j)) {
j++;
}
if (j == m) {
return i - m + 1;
}
}
return -1;
}
}
Complexity
- Time:
O(n + m) for build + search - Space:
O(m) for the failure array - Preprocessing:
O(m) to build failure function once, reusable across texts
Common pitfalls
- Confusing 'longest prefix that is also suffix' with 'longest proper prefix' — f[i] must exclude the full string itself
- Falling back with k = f[k] instead of k = f[k-1], which double-counts and breaks the invariant
- Rewinding the text index i on mismatch — KMP never rewinds text, only the pattern pointer
- Off-by-one when reporting the match index: it is i - m + 1 after the final character match, not i
- Forgetting to handle the empty-pattern case (by convention, return 0)
Key design decisions & trade-offs
- Algorithm choice for single-pattern search — Chosen: KMP vs Boyer-Moore vs Rabin-Karp. KMP is deterministic O(n+m) with tiny constants; Boyer-Moore is faster in practice on large alphabets but worst-case O(nm); Rabin-Karp wins only for multi-pattern search via hashed sets.
- Failure-function variant — Chosen: Plain LPS vs 'strong' failure with character-specific transitions. Strong failure (Aho-Corasick goto) removes the inner while by precomputing a DFA — faster at run time but O(m * alphabet) memory. Plain LPS is enough for single-pattern.
- Match semantics — Chosen: First match vs all matches. Returning on first match short-circuits early; for all matches you set j = f[j-1] after recording and continue, which handles overlapping occurrences naturally.
Practice problems
Interview follow-ups
- Extend to count all overlapping occurrences of pattern in text without restarting
- Explain why the amortized fallback chain per text character is O(1) — give a potential-function argument
- Generalize KMP to Aho-Corasick for multi-pattern search and describe the failure-link construction
- Use KMP to find the smallest period of a string and prove it is m - f[m-1] when that divides m
- How does KMP perform against Z-algorithm for the same problem, and which is easier to get right in an interview?
Why it matters
KMP String Search is a core part of the Arrays & Strings toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".