Union-Find Visualization for Coding Interviews
Path compression + union by rank; near-O(1) per op.
Concept
Union-Find, also known as a Disjoint Set Union (DSU), is a data structure that tracks a partition of elements into non-overlapping sets and supports two near-constant-time operations: find, which returns a representative for the set containing an element, and union, which merges two sets. It is the backbone of Kruskal's minimum spanning tree algorithm, dynamic connectivity queries on an evolving graph, cycle detection in undirected graphs, percolation models, image segmentation via connected components, and the classic offline LCA of Tarjan. The power of the structure comes from two deceptively small optimizations: path compression during find, and union by rank (or size) during union. Together they yield an amortized runtime of O(alpha(n)) per operation, where alpha is the inverse Ackermann function. For any conceivable input size alpha(n) is at most four, so union-find is effectively constant time in practice.
How it works
Internally we represent each set as a rooted tree stored implicitly in a parent[] array, where parent[i] is the index of i's parent and parent[root] == root. The find operation walks up the parent chain from a given node until it reaches the root, which names the set. Path compression is applied on the way back: every visited node has its parent rewired directly to the root, flattening the tree so future queries are cheaper. Union first finds both roots; if they are equal the elements were already in the same set. Otherwise we merge the two trees, but only by attaching the shallower tree under the deeper one. We track tree depth via a rank[] array that we only increment when two trees of equal rank are merged. The combination of path compression and union by rank guarantees amortized inverse-Ackermann time per operation, essentially O(1). Variants swap rank for size (good when you need subset cardinalities), store a delta for weighted unions (for DSU with potentials), or bolt on rollback stacks to support offline dynamic connectivity. The structure uses a single contiguous int[] of length n and is extremely cache-friendly.
Implementation
public class UnionFind {
private final int[] parent;
private final int[] rank;
private int components;
public UnionFind(int n) {
parent = new int[n];
rank = new int[n];
components = n;
for (int i = 0; i < n; i++) parent[i] = i;
}
/** Returns the root of x, compressing the path. */
public int find(int x) {
int root = x;
while (parent[root] != root) root = parent[root];
while (parent[x] != root) {
int next = parent[x];
parent[x] = root;
x = next;
}
return root;
}
/** Union by rank. Returns true if a merge happened. */
public boolean union(int a, int b) {
int ra = find(a), rb = find(b);
if (ra == rb) return false;
if (rank[ra] < rank[rb]) {
parent[ra] = rb;
} else if (rank[ra] > rank[rb]) {
parent[rb] = ra;
} else {
parent[rb] = ra;
rank[ra]++;
}
components--;
return true;
}
public boolean connected(int a, int b) {
return find(a) == find(b);
}
public int componentCount() {
return components;
}
public static void main(String[] args) {
UnionFind uf = new UnionFind(5);
uf.union(0, 1);
uf.union(2, 3);
uf.union(1, 3);
System.out.println(uf.connected(0, 3)); // true
System.out.println(uf.componentCount()); // 2
}
}
Complexity
- Time:
O(alpha(n)) per op - Space:
O(n)
Common pitfalls
- Recursing inside find: deep chains can overflow the JVM stack when path compression has not yet kicked in.
- Dropping union by rank and keeping only path compression: the amortized bound then degrades to O(log n).
- Forgetting to return a boolean from union, so callers cannot distinguish a successful merge from a no-op.
- Using the structure with non-integer keys without first building an index map; DSU requires dense integer IDs.
Key design decisions & trade-offs
- Union by rank vs union by size — Chosen: Rank (tree depth approximation). Rank avoids the extra bookkeeping when you do not need component sizes and ties the bound directly to tree height.
- Iterative vs recursive find — Chosen: Two-pass iterative find. Avoids stack overflow on pathological inputs and still achieves full path compression.
Practice problems
Interview follow-ups
- Add rollback support to answer offline dynamic connectivity queries in O(alpha(n) log q).
- Swap rank[] for size[] to answer subset-size queries in O(1).
- Extend to weighted DSU (with a potential[] array) to support relational constraints like 'a is k greater than b'.
Why it matters
Union-Find is a core part of the Graphs toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".