← System Design Simulator

Slotted Page

By Rahul Kumar · Senior Software Engineer · Updated · Category: Petrov · Storage Engines (Part I)

Variable-size cells, fragmentation, compaction. Ch 3.

This interactive explanation is built for system design interview prep: step through Slotted Page, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

The slotted page is the unit of storage inside nearly every row-oriented database ever written — Postgres, MySQL's InnoDB, Oracle, SQL Server, and SQLite all use a variant of the same layout on every 8 KiB or 16 KiB page. The idea is surgically simple: put fixed-size bookkeeping at the start of the page, let variable-length tuples grow from the end of the page toward the middle, and stitch the two together with a slot array (also called a line pointer array) that maps each row's logical slot number to its physical byte offset on the page. This indirection is the secret sauce — a row id (page_id, slot_id) survives even when the tuple is shuffled within the page during compaction, which means indexes pointing at those row ids never have to be rewritten when a page reorganizes. Understanding the slotted page is the prerequisite for understanding fragmentation, VACUUM, HOT updates, and half the locking story in MVCC databases.

Slotted Page — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Slotted Page widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

A fresh page is split into three regions: a fixed header (page lsn, checksum, free-space hi/lo pointers), a slot array that grows from low addresses upward, and tuple data that grows from high addresses downward. INSERT(bytes): first check that slot_end + sizeof(slot) + len(bytes) <= data_start; if not the page is full. If it fits, decrement data_start by len(bytes), memcpy the tuple there, append a slot record (offset = data_start, length = len) at slot_end, increment slot_end, and return the slot index as the in-page row id. DELETE(slot): mark the slot's length as -1 (tombstone); the bytes stay on the page but are no longer reachable. UPDATE(slot, bytes): if new length <= old length, overwrite in place and shrink the slot length; otherwise insert the new version elsewhere on the page and have the old slot forward to the new slot id (the 'forward pointer' state, which is how Oracle's row chaining and Postgres' HOT updates work). Over time tombstones and forwarding accumulate, so the engine runs COMPACT(): walk the slot array in order, copy live tuples into a contiguous block at the end of the page, rewrite each slot's offset, and reset data_start. Because the slot index never changes during compaction, all external pointers (index entries, TIDs) stay valid without any rewriting.

Implementation

SlottedPage.java
package com.example.page;

import java.nio.ByteBuffer;

/**
 * 8 KiB page:
 *   [header][slot[0]..slot[n-1]] -> grows right
 *                                   <- [tuple[n-1]..tuple[0]] grows left
 * A slot is (short offset, short length). length = -1 marks a tombstone.
 */
public class SlottedPage {
    private static final int PAGE_SIZE = 8192;
    private static final int HEADER_BYTES = 8;      // numSlots(2) + freeStart(2) + freeEnd(2) + flags(2)
    private static final int SLOT_BYTES = 4;

    private final ByteBuffer buf = ByteBuffer.allocate(PAGE_SIZE);
    private short numSlots = 0;
    private short freeStart = HEADER_BYTES;          // where the next slot will be written
    private short freeEnd   = (short) PAGE_SIZE;     // where the last tuple was written (exclusive top)

    /** @return slot id, or -1 if the page is full */
    public int insert(byte[] tuple) {
        int need = SLOT_BYTES + tuple.length;
        if (freeEnd - freeStart < need) return -1;
        freeEnd -= tuple.length;
        System.arraycopy(tuple, 0, buf.array(), freeEnd, tuple.length);
        buf.putShort(freeStart,      (short) freeEnd);
        buf.putShort(freeStart + 2,  (short) tuple.length);
        freeStart += SLOT_BYTES;
        return numSlots++;
    }

    public byte[] read(int slot) {
        int base = HEADER_BYTES + slot * SLOT_BYTES;
        short off = buf.getShort(base);
        short len = buf.getShort(base + 2);
        if (len < 0) return null;                    // tombstone
        byte[] out = new byte[len];
        System.arraycopy(buf.array(), off, out, 0, len);
        return out;
    }

    public void delete(int slot) {
        int base = HEADER_BYTES + slot * SLOT_BYTES;
        buf.putShort(base + 2, (short) -1);          // tombstone; compact reclaims bytes
    }

    /** Reclaim space from tombstones and gaps by rewriting live tuples contiguously. */
    public void compact() {
        byte[] scratch = new byte[PAGE_SIZE];
        short cursor = (short) PAGE_SIZE;
        for (int s = 0; s < numSlots; s++) {
            int base = HEADER_BYTES + s * SLOT_BYTES;
            short len = buf.getShort(base + 2);
            if (len < 0) continue;                   // skip tombstones, slot id preserved
            short off = buf.getShort(base);
            cursor -= len;
            System.arraycopy(buf.array(), off, scratch, cursor, len);
            buf.putShort(base, cursor);              // rewrite offset; length unchanged
        }
        System.arraycopy(scratch, cursor, buf.array(), cursor, PAGE_SIZE - cursor);
        freeEnd = cursor;
        // freeStart and numSlots are intentionally unchanged — external TIDs remain valid.
    }

    public int freeSpace() { return freeEnd - freeStart; }
    public int slots()     { return numSlots; }
}

Complexity

Key design decisions & trade-offs

Common pitfalls

Interview follow-ups

Recommended reading

Related