← System Design Simulator

Data Models

By Rahul Kumar · Senior Software Engineer · Updated · Category: Kleppmann · Designing Data-Intensive Applications

Relational / document / graph side-by-side with queries. Ch 2.

This interactive explanation is built for system design interview prep: step through Data Models, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

Relational, document, and graph data models are three different answers to the same question: how should the shape of your data match the shape of your queries? Relational tables enforce a schema up front, split data across normalized rows, and lean on joins to reassemble it. Document stores embed related data as a single self-describing blob, optimizing locality for one-shot reads at the cost of update anomalies and awkward cross-document joins. Graph databases model entities and the relationships between them as first-class citizens, so queries that would be N-way joins in SQL become constant-time traversals regardless of how many hops they span. Kleppmann frames the choice as query-shape versus data-shape impedance: pick the model whose access pattern already matches your reads, because fighting the mismatch later is what produces brittle data layers. Most real systems end up polyglot, carrying one workload per store.

Data Models — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Data Models widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

Relational stores break an entity into normalized rows across many tables and use foreign keys to preserve referential integrity; the database rebuilds the object at read time by joining rows on those keys. This is great when the same underlying facts are queried from multiple angles ("show me users of this org" and "show me orgs of this user"). Document stores flip the tradeoff: the whole entity lives in a single nested record, usually JSON or BSON, so one lookup returns everything the UI needs with no join cost. The price is denormalization: a user's name embedded in every order becomes stale the moment the user renames, and cross-document transactions are historically weak. Graph databases use an adjacency-list representation where each vertex stores pointers to its edges, so an edge traversal is a direct memory dereference. A three-hop friend-of-friend query that would require two self-joins over a billion-row edge table in SQL becomes a bounded walk in a graph engine. The crucial realization is that all three models can encode any data; the difference is how expensive your actual queries become. Pick the model that makes your hottest read path a single lookup or a short walk, not a multi-way join.

Implementation

Relational: JPA @Entity
import jakarta.persistence.*;
import java.time.Instant;
import java.util.*;

@Entity
@Table(name = "users")
public class UserRow {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(nullable = false, unique = true)
    private String email;

    @Column(nullable = false)
    private String name;

    @Column(name = "created_at", nullable = false)
    private Instant createdAt;

    @OneToMany(mappedBy = "user", fetch = FetchType.LAZY, cascade = CascadeType.ALL)
    private List<OrderRow> orders = new ArrayList<>();

    @ManyToMany
    @JoinTable(name = "user_roles",
        joinColumns = @JoinColumn(name = "user_id"),
        inverseJoinColumns = @JoinColumn(name = "role_id"))
    private Set<RoleRow> roles = new HashSet<>();

    protected UserRow() {}

    public UserRow(String email, String name) {
        this.email = email;
        this.name = name;
        this.createdAt = Instant.now();
    }

    public Long id() { return id; }
    public String email() { return email; }
    public String name() { return name; }
    public List<OrderRow> orders() { return orders; }
    public Set<RoleRow> roles() { return roles; }
}
Document: MongoDB POJO
import org.bson.codecs.pojo.annotations.BsonId;
import org.bson.codecs.pojo.annotations.BsonProperty;
import java.time.Instant;
import java.util.*;

public final class UserDoc {
    @BsonId
    private String id;                       // usually ObjectId string

    @BsonProperty("email")
    private String email;

    @BsonProperty("name")
    private String name;

    @BsonProperty("created_at")
    private Instant createdAt;

    // Embedded subdocuments: one read returns the user and their recent orders.
    @BsonProperty("recent_orders")
    private List<EmbeddedOrder> recentOrders = new ArrayList<>();

    @BsonProperty("roles")
    private List<String> roles = new ArrayList<>();  // denormalized role names

    public UserDoc() {}

    public UserDoc(String email, String name) {
        this.email = email;
        this.name = name;
        this.createdAt = Instant.now();
    }

    public static final class EmbeddedOrder {
        @BsonProperty("order_id") private String orderId;
        @BsonProperty("total_cents") private long totalCents;
        @BsonProperty("placed_at") private Instant placedAt;

        public EmbeddedOrder() {}
        public EmbeddedOrder(String id, long cents, Instant at) {
            this.orderId = id; this.totalCents = cents; this.placedAt = at;
        }
    }

    public String id() { return id; }
    public String email() { return email; }
    public List<EmbeddedOrder> recentOrders() { return recentOrders; }
}
Graph: Neo4j Cypher query wrapper
import org.neo4j.driver.*;
import java.util.*;
import static org.neo4j.driver.Values.parameters;

public final class SocialGraphClient implements AutoCloseable {
    private final Driver driver;

    public SocialGraphClient(String uri, String user, String pass) {
        this.driver = GraphDatabase.driver(uri, AuthTokens.basic(user, pass));
    }

    /** Create a User vertex. */
    public void upsertUser(String userId, String name) {
        try (Session s = driver.session()) {
            s.executeWrite(tx -> tx.run(
                "MERGE (u:User {id: $id}) SET u.name = $name RETURN u",
                parameters("id", userId, "name", name)).consume());
        }
    }

    /** Create a FRIEND edge between two users. */
    public void follow(String fromId, String toId) {
        try (Session s = driver.session()) {
            s.executeWrite(tx -> tx.run(
                "MATCH (a:User {id: $a}), (b:User {id: $b}) " +
                "MERGE (a)-[:FRIEND]->(b)",
                parameters("a", fromId, "b", toId)).consume());
        }
    }

    /** Friends-of-friends up to depth 3 — trivial traversal, not a 3-way join. */
    public List<String> friendsOfFriends(String userId, int depth) {
        try (Session s = driver.session()) {
            return s.executeRead(tx -> tx.run(
                "MATCH (u:User {id: $id})-[:FRIEND*2.." + depth + "]->(f:User) " +
                "WHERE f.id <> $id RETURN DISTINCT f.id AS id",
                parameters("id", userId))
                .list(r -> r.get("id").asString()));
        }
    }

    @Override public void close() { driver.close(); }
}

Complexity

Key design decisions & trade-offs

Common pitfalls

Interview follow-ups

Recommended reading

Related