Kruskal MST Visualization for Coding Interviews
Sort edges + union-find; picks lightest cross-cut edge.
Concept
Kruskal's algorithm computes a minimum spanning tree (MST) of a connected, weighted, undirected graph by processing edges from lightest to heaviest and greedily accepting any edge that does not create a cycle. The MST is the subset of edges that connects every vertex with the smallest possible total weight and contains exactly V-1 edges. Kruskal's runs in O(E log E) time dominated by sorting the edge list; after that every cycle check is an amortized O(alpha(V)) union-find query. It is a textbook application of the cut property: for any partition of the vertices, the lightest edge crossing the cut must belong to some MST. The algorithm shines on sparse graphs and when edges are already in sorted order (for example, from streaming input), where it becomes near-linear. Classic use cases include designing cable/road networks, clustering via single-linkage agglomeration, image segmentation, and approximation algorithms for TSP.
How it works
We start by materializing the full edge list as triples (u, v, weight). We sort the list by weight ascending. We initialize a union-find structure over V vertices so every vertex starts as its own component. We then iterate through the sorted edges. For each edge (u, v, w) we ask union-find whether u and v are already connected. If they are, adding this edge would create a cycle and we skip it. Otherwise we call union(u, v), append the edge to our MST edge list, and accumulate w into a running total. We can stop early as soon as we have accepted V-1 edges, because any connected graph's MST has exactly V-1 edges. If after processing every edge we still have fewer than V-1 accepted edges, the input graph was disconnected and only a minimum spanning forest exists. The correctness relies on the cut property: when we consider the lightest remaining edge, the partition formed by its endpoint's current components is a valid cut, so that edge is safe to add. The union-find guarantees O(alpha(V)) per query, so sorting dominates.
Implementation
import java.util.*;
public class Kruskal {
public static class Result {
public final List<int[]> picked;
public final long totalWeight;
public Result(List<int[]> picked, long totalWeight) {
this.picked = picked;
this.totalWeight = totalWeight;
}
}
private final int[] parent, rank;
public Kruskal(int n) {
parent = new int[n];
rank = new int[n];
for (int i = 0; i < n; i++) parent[i] = i;
}
private int find(int x) {
while (parent[x] != x) {
parent[x] = parent[parent[x]];
x = parent[x];
}
return x;
}
private boolean union(int a, int b) {
int ra = find(a), rb = find(b);
if (ra == rb) return false;
if (rank[ra] < rank[rb]) parent[ra] = rb;
else if (rank[ra] > rank[rb]) parent[rb] = ra;
else { parent[rb] = ra; rank[ra]++; }
return true;
}
/**
* edges[i] = {u, v, w}. Returns picked edges and total MST weight.
*/
public static Result kruskalMST(int[][] edges, int n) {
int[][] sorted = edges.clone();
Arrays.sort(sorted, Comparator.comparingInt(e -> e[2]));
Kruskal uf = new Kruskal(n);
List<int[]> picked = new ArrayList<>();
long total = 0;
for (int[] e : sorted) {
if (uf.union(e[0], e[1])) {
picked.add(e);
total += e[2];
if (picked.size() == n - 1) break;
}
}
return new Result(picked, total);
}
public static void main(String[] args) {
int[][] edges = {{0,1,4},{0,2,3},{1,2,1},{1,3,2},{2,3,4},{3,4,2}};
Result r = kruskalMST(edges, 5);
System.out.println("weight=" + r.totalWeight);
}
}
Complexity
- Time:
O(E log E) - Space:
O(V+E)
Common pitfalls
- Using int for cumulative weight when E is large or weights are high: overflow is easy, use long.
- Mutating the input edge array by sorting in place, which surprises callers that expect immutability.
- Not handling disconnected graphs: the returned edge list will have fewer than V-1 edges, and callers must check.
- Forgetting that Kruskal requires an undirected graph; passing directed edges silently produces nonsense.
Key design decisions & trade-offs
- Kruskal vs Prim — Chosen: Kruskal with Union-Find. Kruskal is simpler to parallelize on sorted streaming edges and works well on sparse graphs given as edge lists.
- Sort entire edge list vs lazy heap — Chosen: Full sort upfront. Simpler and cache-friendly; a heap only pays off when you can stop very early relative to E.
Practice problems
Interview follow-ups
- Support minimum spanning forests by returning whatever edges were accepted and exposing the component count.
- Extend to second-best MST by removing each tree edge and recomputing via a modified Kruskal run.
- Parallelize on Boruvka steps to process disjoint cheapest edges concurrently.
Why it matters
Kruskal MST is a core part of the Graphs toolkit. If you are preparing for a technical interview at a top-tier company or teaching a course, a working visualization is the fastest path from "I read about it" to "I understand it".