← System Design Simulator

B-Tree

By Rahul Kumar · Senior Software Engineer · Updated · Category: Petrov · Storage Engines (Part I)

Insert, search, delete with animated splits. Ch 2 + 4.

This interactive explanation is built for system design interview prep: step through B-Tree, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

The B-tree is the workhorse index of every major relational database — Postgres, MySQL's InnoDB, Oracle, SQL Server, SQLite — and understanding it is non-negotiable if you want to reason about query plans, locking, or why a particular DDL change locks up production. Unlike the binary search trees taught in algorithms class, a B-tree is a wide, shallow, balanced tree where each node holds hundreds of keys so that the height stays under five even for billion-row tables. Every node corresponds to a fixed-size disk page (usually 8 KiB or 16 KiB), so a point lookup is four or five page I/Os worst-case, and a range scan is a linked-list walk across leaves. This page covers the classical B+ tree: insert with split-on-overflow cascading up the root, search via per-node binary search, and the structural invariants that guarantee logarithmic height no matter the insertion order.

B-Tree — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive B-Tree widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

A B+ tree of order m keeps between ceil(m/2) and m keys per node and routes based on separator keys. SEARCH(key) enters the root, binary-searches its keys to pick a child pointer, and repeats until it hits a leaf; there the same binary search yields either the tuple id or a not-found. INSERT(key, tid) first walks the same path to the correct leaf; if the leaf has room the key is simply inserted in sorted order (state LEAF_INSERT). If the leaf is full the tree enters SPLIT: the leaf's keys are partitioned into two new leaves, a separator (the smallest key of the right half) is promoted to the parent, and sibling leaf pointers are rewired so range scans still work. That promotion can itself overflow the parent, triggering the same split one level up — the CASCADE state — and in the worst case it propagates all the way to the root, at which point a new root is allocated and the tree grows one level taller (the only way a B-tree's height ever changes). Deletes mirror this with merge/redistribute, though production engines often skip merging and just tombstone + rebuild. Because every path from root to leaf has the same length, lookups and inserts are both O(log_m n), and with m around 200 even a trillion-row index is at most five levels deep.

Implementation

BTreeNode.java
package com.example.btree;

import java.util.Arrays;

public class BTreeNode {
    static final int ORDER = 4;              // max children; keys <= ORDER-1
    int numKeys;
    final int[] keys = new int[ORDER];       // one slot of slack for easier split math
    final BTreeNode[] children = new BTreeNode[ORDER + 1];
    boolean leaf;

    BTreeNode(boolean leaf) { this.leaf = leaf; }

    /** Binary search for key; returns the child index to descend into. */
    int childIndex(int key) {
        int lo = 0, hi = numKeys;
        while (lo < hi) {
            int mid = (lo + hi) >>> 1;
            if (keys[mid] <= key) lo = mid + 1; else hi = mid;
        }
        return lo;
    }

    /** True if key exists in the subtree rooted here. */
    boolean search(int key) {
        int i = 0;
        while (i < numKeys && keys[i] < key) i++;
        if (i < numKeys && keys[i] == key) return true;
        return !leaf && children[i].search(key);
    }

    boolean isFull() { return numKeys == ORDER - 1; }
}
BTree.java
package com.example.btree;

public class BTree {
    private BTreeNode root = new BTreeNode(true);

    public boolean contains(int key) { return root.search(key); }

    public void insert(int key) {
        if (root.isFull()) {
            // root split -> tree grows one level
            BTreeNode newRoot = new BTreeNode(false);
            newRoot.children[0] = root;
            splitChild(newRoot, 0);
            root = newRoot;
        }
        insertNonFull(root, key);
    }

    private void insertNonFull(BTreeNode n, int key) {
        int i = n.numKeys - 1;
        if (n.leaf) {
            while (i >= 0 && n.keys[i] > key) { n.keys[i + 1] = n.keys[i]; i--; }
            n.keys[i + 1] = key;
            n.numKeys++;
            return;
        }
        while (i >= 0 && n.keys[i] > key) i--;
        i++;
        if (n.children[i].isFull()) {
            splitChild(n, i);
            if (n.keys[i] < key) i++;
        }
        insertNonFull(n.children[i], key);
    }

    private void splitChild(BTreeNode parent, int idx) {
        BTreeNode full = parent.children[idx];
        BTreeNode right = new BTreeNode(full.leaf);
        int mid = (BTreeNode.ORDER - 1) / 2;
        right.numKeys = full.numKeys - mid - 1;
        System.arraycopy(full.keys, mid + 1, right.keys, 0, right.numKeys);
        if (!full.leaf) {
            System.arraycopy(full.children, mid + 1, right.children, 0, right.numKeys + 1);
        }
        // shift parent to make room for the promoted separator
        for (int j = parent.numKeys; j > idx; j--) {
            parent.keys[j] = parent.keys[j - 1];
            parent.children[j + 1] = parent.children[j];
        }
        parent.keys[idx] = full.keys[mid];
        parent.children[idx + 1] = right;
        parent.numKeys++;
        full.numKeys = mid;
    }
}

Complexity

Key design decisions & trade-offs

Common pitfalls

Interview follow-ups

Recommended reading

Related