Edit Distance Visualization for Coding Interviews
Levenshtein: insert/delete/replace costs; DP grid + path.
Concept
Edit distance (Levenshtein distance) measures the minimum number of single-character edits — insertions, deletions, or substitutions — required to transform one string into another. It powers spell checkers, fuzzy search, DNA alignment, and 'did you mean?' suggestions in search UIs. The vanilla DP is a close cousin of LCS: same two-dimensional table, subtly different recurrence. Edit distance is also the classic example used to teach rolling-row space optimization because reconstruction is often unnecessary — a spell checker only needs the distance number, not the edit script. Variants include Damerau-Levenshtein (which adds transpositions), weighted edit distance (where insertions, deletions, and substitutions have different costs), and restricted forms used in compiler error recovery.
How it works
Let dp[i][j] be the minimum edits to convert the first i characters of A into the first j characters of B. Base cases: dp[0][j] = j (j insertions to build B from empty) and dp[i][0] = i (i deletions to empty A). Recurrence: if A[i-1] == B[j-1], then no edit is needed for this pair, so dp[i][j] = dp[i-1][j-1]. Otherwise dp[i][j] = 1 + min of three options — dp[i-1][j] representing delete A[i-1], dp[i][j-1] representing insert B[j-1], and dp[i-1][j-1] representing substitute A[i-1] with B[j-1]. The answer is dp[n][m]. The table is filled in row-major order, and each cell only depends on the cell above, to the left, and diagonally up-left. This structure lets you keep just two rows (previous and current) for O(m) space. To reconstruct the edit script, backtrack from (n, m): equal characters move diagonally with no edit, otherwise pick the direction whose neighbor is one less than the current cell and emit the corresponding operation. Asymmetric costs or bounded edit distance (only care if within k) give rise to diagonal-band DP that runs in O(n * k).
Implementation
public final class EditDistance {
private EditDistance() {}
public static int minDistance(String a, String b) {
int n = a.length(), m = b.length();
int[][] dp = new int[n + 1][m + 1];
for (int i = 0; i <= n; i++) dp[i][0] = i;
for (int j = 0; j <= m; j++) dp[0][j] = j;
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= m; j++) {
if (a.charAt(i - 1) == b.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1];
} else {
int del = dp[i - 1][j];
int ins = dp[i][j - 1];
int sub = dp[i - 1][j - 1];
dp[i][j] = 1 + Math.min(sub, Math.min(del, ins));
}
}
}
return dp[n][m];
}
public static int minDistanceRolling(String a, String b) {
int n = a.length(), m = b.length();
int[] prev = new int[m + 1];
int[] curr = new int[m + 1];
for (int j = 0; j <= m; j++) prev[j] = j;
for (int i = 1; i <= n; i++) {
curr[0] = i;
for (int j = 1; j <= m; j++) {
if (a.charAt(i - 1) == b.charAt(j - 1)) curr[j] = prev[j - 1];
else curr[j] = 1 + Math.min(prev[j - 1], Math.min(prev[j], curr[j - 1]));
}
int[] tmp = prev; prev = curr; curr = tmp;
}
return prev[m];
}
}
Complexity
- Time:
O(n * m) - Space (full):
O(n * m) - Space (rolling):
O(min(n, m))
Common pitfalls
- Forgetting to seed base cases — dp[i][0] = i and dp[0][j] = j are not zero.
- Mixing up insert and delete directions in the recurrence; both are legal but must be consistent.
- Applying classic Levenshtein when Damerau (with transposition) is what's needed.
- Using int arithmetic without guarding against very long strings overflowing intermediate sums.
- Rolling-row optimization that forgets to copy prev from curr at end of row.
Key design decisions & trade-offs
- Uniform vs weighted costs — Chosen: Weighted when substitutions cost differently from inserts. DNA and OCR often prefer specific error types; recurrence generalizes cleanly.
- Full table vs rolling rows — Chosen: Rolling when reconstruction not needed. Cuts memory from O(nm) to O(m); keep full table if you need the edit script.
- Exact vs bounded-k DP — Chosen: Bounded-k diagonal band when k is small. Spell check typically caps at k=2; diagonal DP runs in O(n * k) instead of O(n^2).
Practice problems
Interview follow-ups
- Extend to Damerau-Levenshtein by adding a transposition case.
- Reconstruct the explicit edit script (sequence of ins/del/sub operations).
- Implement bounded-k edit distance with early exit when dp row minimum exceeds k.
- Use edit distance for fuzzy autocomplete over a dictionary of 100k words.
Why it matters
Edit Distance is a core part of the Dynamic Programming toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".