Z-Algorithm Visualization for Coding Interviews
Z-array construction; linear-time multi-pattern matching base.
Concept
The Z-algorithm computes, for a string s of length n, an array Z where Z[i] is the length of the longest substring starting at i that matches a prefix of s. Build it in strict O(n) time with O(n) extra space, and you get a Swiss-army primitive for exact matching (search for pattern P inside text T by building Z on P + '$' + T and scanning for Z[i] == |P|), for finding all borders and periods of a string, for counting distinct substrings in combination with suffix structures, and for string compression problems. Compared to KMP, Z-array is often easier to derive from scratch under interview pressure because the recurrence is just 'how far can I extend a prefix-match at this index', with no failure-function bookkeeping.
How it works
Maintain a sliding Z-box [l, r] — the rightmost interval such that s[l..r] is known to equal s[0..r-l], i.e. a previously matched prefix. Initialize l = r = 0 and Z[0] = n (by convention, or leave it 0). For each i from 1 to n-1, if i < r you already know s[i..r] equals s[i-l..r-l], so Z[i] starts at min(r - i, Z[i - l]); if i >= r you have no prior info and start at 0. Then, while i + Z[i] < n and s[Z[i]] == s[i + Z[i]], extend Z[i] one character at a time — this is the only place you do real character comparisons. After extending, if i + Z[i] - 1 > r, slide the Z-box to [i, i + Z[i] - 1]. The critical invariant is that r is monotonically non-decreasing, so the total number of extending character comparisons across the whole loop is bounded by n — every comparison either pushes r forward or terminates the inner while. That's why the algorithm is O(n) despite the inner loop. For pattern matching, build Z on P + '$' + T where '$' is a sentinel not in either string; any Z[i] equal to |P| marks a match of P at position i - |P| - 1 in T.
Implementation
package dsa.zalgo;
public final class ZAlgorithm {
// Z[i] = length of the longest substring starting at i that matches a prefix of s.
// Z[0] is set to s.length() by convention.
public static int[] buildZArray(String s) {
int n = s.length();
int[] z = new int[n];
if (n == 0) return z;
z[0] = n;
int l = 0, r = 0;
for (int i = 1; i < n; i++) {
if (i < r) {
z[i] = Math.min(r - i, z[i - l]);
}
while (i + z[i] < n && s.charAt(z[i]) == s.charAt(i + z[i])) {
z[i]++;
}
if (i + z[i] > r) {
l = i;
r = i + z[i];
}
}
return z;
}
// Pattern search via Z on P + '$' + T. Returns first match index in text, or -1.
public static int search(String text, String pattern) {
if (pattern.isEmpty()) return 0;
String combined = pattern + '\u0001' + text; // sentinel below printable
int[] z = buildZArray(combined);
int m = pattern.length();
for (int i = m + 1; i < z.length; i++) {
if (z[i] == m) {
return i - m - 1;
}
}
return -1;
}
}
Complexity
- Time:
O(n) to build the Z-array; O(n + m) for pattern search - Space:
O(n) for the Z-array plus O(n + m) for the combined string in search
Common pitfalls
- Forgetting the i < r branch and always starting Z[i] from 0 — collapses to O(n^2) on inputs like 'aaaa...'
- Picking a sentinel character that actually appears in the text or pattern — false matches bleed across the boundary
- Updating the Z-box when Z[i] did not strictly push r forward, which breaks the monotonic invariant
- Off-by-one with Z[i - l] — it indexes into the prefix, so i - l must be >= 1
- Returning Z[0] = n and then accidentally treating position 0 as a match in pattern search (skip index 0 of the combined string)
Key design decisions & trade-offs
- Z-array vs failure function — Chosen: Z-array for derivation clarity, KMP failure for streaming. Z-array is easier to prove correct and reuse for border/period questions, but it requires the whole string in memory. KMP can process the text in a single forward pass with O(m) state.
- Sentinel choice — Chosen: Unicode code point outside input alphabet vs explicit terminator. A guaranteed-absent sentinel keeps the combined-string trick simple. If no safe sentinel exists, you can split into two passes or bound the scan to i > m.
- Memory — Chosen: Full Z-array vs on-the-fly computation. Storing Z[] is O(n) but lets you answer multiple queries (borders, periods, distinct substrings). If you only need one match, you can discard entries as you go.
Practice problems
Interview follow-ups
- Prove that the total number of inner-while character comparisons across the whole loop is O(n)
- Use the Z-array to enumerate all periods of a string in O(n) and explain the connection to borders
- Count the number of distinct substrings of s using Z-functions of its suffixes
- Compare Z-array against KMP's failure function — translate one to the other in O(n)
- Extend to compute the longest common prefix of every suffix of s with a different string t
Why it matters
Z-Algorithm is a core part of the Arrays & Strings toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".