← System Design Simulator

Row vs Column

By Rahul Kumar · Senior Software Engineer · Updated · Category: Petrov · Storage Engines (Part I)

Storage layout & scan cost. Ch 1.

This interactive explanation is built for system design interview prep: step through Row vs Column, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

Row-oriented and column-oriented storage lay out exactly the same logical table in physically opposite ways, and that single choice drives the personality of a database more than any other internal decision. Row stores (Postgres, MySQL, Oracle) write every attribute of a tuple contiguously on a page, so a point read of a whole record is one sequential disk touch, and OLTP workloads that insert or update full rows stay fast. Column stores (Vertica, Redshift, ClickHouse, Parquet) instead write each attribute into its own contiguous vector, so analytical scans that read only a handful of columns from billions of rows skip over the rest of the table and run an order of magnitude faster. The catch: column stores hate point writes because a single row update must touch N files, and row stores hate analytical scans because they drag useless columns through the memory hierarchy. Understanding when to pick each — and hybrids like PAX and Apache Arrow — is the crux of modern storage engine design.

Row vs Column — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Row vs Column widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

A row store lays out tuples in a heap file one after another, typically inside a slotted page; the scan path is pread() the page into the buffer pool, walk the slot array, and materialize a Java object per slot. A columnar store instead writes each attribute as a separate array on disk, often with a lightweight header describing dictionary encoding, run-length encoding, or bit-packing. When a query like SELECT SUM(amount) FROM orders WHERE region = 'US' arrives, the columnar executor enters a state machine: OPEN_COLUMN (mmap the region vector), FILTER (SIMD-compare each lane against the constant 'US' to produce a selection bitmap), PROJECT (apply the bitmap to the amount vector), and AGGREGATE (vectorized sum over surviving lanes). The row engine must instead OPEN_PAGE, for each slot DECODE_TUPLE, CHECK_PREDICATE on region, and then ADD amount — paying per-tuple dispatch cost on every row, including rows it ends up rejecting. Writes invert the story: a row insert is one append; a columnar insert must split the tuple across N columns, which is why real systems buffer writes in a row-oriented write-optimized store (the 'delta') and merge periodically into the columnar base (the 'main').

Implementation

ColumnStore.java
package com.example.storage;

import java.util.*;

/** One column = one primitive array. Reads of a single column are pure sequential memory. */
public class ColumnStore {
    private final int[] region;     // dictionary-encoded: 0=US, 1=EU, 2=APAC
    private final long[] amount;
    private final int rows;

    public ColumnStore(int[] region, long[] amount) {
        if (region.length != amount.length) throw new IllegalArgumentException();
        this.region = region;
        this.amount = amount;
        this.rows = region.length;
    }

    /** SUM(amount) WHERE region = target. Vectorized, branch-free inner loop. */
    public long sumWhereRegion(int target) {
        long sum = 0;
        for (int i = 0; i < rows; i++) {
            // mask = -1 when match, 0 otherwise; avoids a hard-to-predict branch
            long mask = -(long) ((region[i] == target) ? 1 : 0);
            sum += amount[i] & mask;
        }
        return sum;
    }
}
RowStore.java
package com.example.storage;

import java.util.*;

/** One row = one object. Good for point lookups and full-tuple reads. */
public class RowStore {
    public static final class Order {
        public final int region;
        public final long amount;
        public final long customerId;
        public final long ts;
        public Order(int region, long amount, long customerId, long ts) {
            this.region = region; this.amount = amount;
            this.customerId = customerId; this.ts = ts;
        }
    }

    private final List<Order> heap = new ArrayList<>();

    public void insert(Order o) { heap.add(o); }
    public Order get(int rowId) { return heap.get(rowId); }

    /** Same query as ColumnStore but forced to drag every field through cache. */
    public long sumWhereRegion(int target) {
        long sum = 0;
        for (Order o : heap) {
            if (o.region == target) sum += o.amount;
        }
        return sum;
    }
}
ScanBenchmark.java
package com.example.storage;

import java.util.concurrent.ThreadLocalRandom;

public class ScanBenchmark {
    public static void main(String[] args) {
        final int N = 10_000_000;
        int[] region = new int[N];
        long[] amount = new long[N];
        RowStore rows = new RowStore();
        ThreadLocalRandom r = ThreadLocalRandom.current();
        for (int i = 0; i < N; i++) {
            region[i] = r.nextInt(3);
            amount[i] = r.nextLong(1_000);
            rows.insert(new RowStore.Order(region[i], amount[i], r.nextLong(), r.nextLong()));
        }
        ColumnStore cols = new ColumnStore(region, amount);

        long t1 = System.nanoTime(); long sC = cols.sumWhereRegion(0);
        long t2 = System.nanoTime(); long sR = rows.sumWhereRegion(0);
        long t3 = System.nanoTime();

        System.out.printf("col: %d ms  sum=%d%n", (t2 - t1) / 1_000_000, sC);
        System.out.printf("row: %d ms  sum=%d%n", (t3 - t2) / 1_000_000, sR);
        // Typical result: column store is 3-10x faster because it scans
        // 8 bytes/row instead of a ~32-byte Order object.
    }
}

Complexity

Key design decisions & trade-offs

Common pitfalls

Interview follow-ups

Recommended reading

Related