LCS Visualization for Coding Interviews
Longest common subsequence; 2D DP table with traceback.
Concept
Longest Common Subsequence is the string-similarity problem hiding inside diff tools, DNA alignment, spell checkers, and merge algorithms. Given two sequences, it finds the longest sequence that appears in both as a subsequence (not necessarily contiguous). Unix diff, git's patience algorithm, and bioinformatics pairwise alignment are all LCS derivatives. The textbook DP is O(nm) time and space, which is fine when both strings are a few thousand characters but becomes painful on megabyte files — real tools use Hirschberg's O(m) space trick, Myers's O(ND) diff algorithm, or heuristic patience diffs. Still, understanding the base DP is non-negotiable: it is the clearest illustration of how two-dimensional tables, recurrences, and traceback work together to turn an exponential brute force into a polynomial solution with a reconstructable answer.
How it works
Let dp[i][j] be the length of the LCS of the first i characters of A and the first j characters of B. The recurrence is: if A[i-1] == B[j-1] then dp[i][j] = dp[i-1][j-1] + 1, meaning we extend the common subsequence by matching the current pair; otherwise dp[i][j] = max(dp[i-1][j], dp[i][j-1]), meaning we drop either the current char of A or the current char of B and keep whichever side yields the longer LCS. The base cases dp[0][j] = dp[i][0] = 0 encode that any LCS with an empty string is empty. Filling the table takes O(nm) time. To recover the actual subsequence, walk backwards from dp[n][m]: if A[i-1] == B[j-1] prepend that character and move diagonally to (i-1, j-1); otherwise move in the direction of the larger neighbor (up if dp[i-1][j] > dp[i][j-1], else left). The path from (n, m) to (0, 0) emits the LCS in reverse. Space can be dropped to O(min(n, m)) by only keeping the previous row, at the cost of losing traceback unless you run Hirschberg's divide-and-conquer variant.
Implementation
public final class LCS {
private LCS() {}
public static int lcsLength(String a, String b) {
int n = a.length(), m = b.length();
int[][] dp = new int[n + 1][m + 1];
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= m; j++) {
if (a.charAt(i - 1) == b.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1] + 1;
else dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
return dp[n][m];
}
public static String lcsTraceback(String a, String b) {
int n = a.length(), m = b.length();
int[][] dp = new int[n + 1][m + 1];
for (int i = 1; i <= n; i++) {
for (int j = 1; j <= m; j++) {
if (a.charAt(i - 1) == b.charAt(j - 1)) dp[i][j] = dp[i - 1][j - 1] + 1;
else dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
StringBuilder sb = new StringBuilder();
int i = n, j = m;
while (i > 0 && j > 0) {
if (a.charAt(i - 1) == b.charAt(j - 1)) { sb.append(a.charAt(i - 1)); i--; j--; }
else if (dp[i - 1][j] >= dp[i][j - 1]) i--;
else j--;
}
return sb.reverse().toString();
}
}
Complexity
- Time:
O(n * m) - Space:
O(n * m) - Space (length only):
O(min(n, m))
Common pitfalls
- Confusing subsequence with substring — LCS allows gaps, LCSubstring does not.
- Storing characters in the DP table instead of lengths — breaks the max recurrence.
- Traceback loop that moves diagonally on a tie — only do so when characters actually match.
- Using O(nm) space on megabyte inputs; switch to rolling rows or Hirschberg.
- Off-by-one between 0-indexed strings and 1-indexed DP table.
Key design decisions & trade-offs
- Full table vs rolling rows — Chosen: Rolling rows when traceback not needed. Cuts memory from O(nm) to O(m) at no time cost.
- LCS vs Levenshtein edit distance — Chosen: LCS when only insert/delete matter. Edit distance allows substitutions; LCS does not, which changes the recurrence.
- Classic DP vs Myers diff — Chosen: Myers for large files with few differences. Myers runs in O(ND) where D is the edit count — linear when files barely differ.
Practice problems
Interview follow-ups
- Implement Hirschberg's algorithm to recover LCS in O(n) space.
- Adapt the DP to compute the Shortest Common Supersequence.
- Use LCS to build a simple diff tool for two text files.
- Solve Longest Palindromic Subsequence via LCS(s, reverse(s)).
Why it matters
LCS is a core part of the Dynamic Programming toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".