← System Design Simulator

Scale from Zero to Millions of Users System Design Interview Question

By Rahul Kumar · Senior Software Engineer · Updated · 11 components · 3 operations ·Source: Alex Xu, System Design Interview Vol 1, Chapter 1

Problem: Evolve a single-server web app into a multi-tier, multi-region architecture that serves millions of users with low latency and high availability.

Overview

Scaling from zero to millions of users is less a single decision than a sequence of forced moves: each bottleneck the traffic exposes pushes you one step further down a well-worn path. You start with a single box running web, app, and database on the same machine, and every subsequent iteration peels a responsibility off that box. The first split is usually web tier from DB tier; next the web tier goes stateless by moving sessions into a shared store so autoscaling can add and drop instances freely; then caching arrives to keep hot reads off the primary; then read replicas, a CDN, a message queue, and finally sharding by user id and geo-replication for disaster recovery. The useful mental model is not a blueprint but a ladder: each rung solves the problem the previous rung created. This page walks the ladder end-to-end and shows the Java glue you need at each step so the evolution feels concrete instead of hand-wavy.

Scale from Zero to Millions of Users — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive walkthrough for Scale from Zero to Millions of Users — animated architecture diagram, step-by-step flow with real payloads, component swap, and a discrete-event stress simulator.

Summary

A canonical read-heavy web application (about 10:1 read/write). We take the mature end-state that Alex Xu's chapter converges on: geo-DNS routes users to the nearest data center; an L7 load balancer terminates TLS and spreads traffic across a stateless web tier; session state is evicted from the web tier into a shared NoSQL store so any server can handle any request (enabling autoscaling); a Redis cache absorbs the hot read set; the SQL store is split into a master for writes and slave replicas for reads; a CDN fronts static assets; a message queue decouples slow work (photo processing, emails) to asynchronous worker pools; the data tier ultimately shards by user id; logging/metrics/automation run across everything. The dominant tradeoff is consistency vs latency — reads from replicas and CDN edges can be seconds stale, so write paths must invalidate cache entries and accept replica lag on subsequent reads.

Requirements

Functional

Non-functional

Capacity Assumptions

Back-of-Envelope Estimates

High-level architecture

Traffic enters through geo-DNS, which steers each user to the nearest regional edge. A layer-7 load balancer terminates TLS and fans requests across a stateless web tier running in multiple availability zones; any instance can serve any request because sessions live in a shared NoSQL store rather than local memory. The web tier talks to a Redis cluster for hot reads using a cache-aside pattern: miss, load from the primary, populate, set a TTL. Writes go to the SQL master, which asynchronously replicates to read replicas; the data-source router in the web app directs SELECTs to replicas and INSERT/UPDATE/DELETE to the master, accepting a small window of replica lag. Static assets and user media sit behind a CDN that pulls from object storage, so the application servers never touch image bytes. Slow or bursty work — photo processing, webhooks, transactional email — is published to a message queue and drained by a separate worker fleet that can scale independently of the web tier. As data grows, the SQL store is sharded by user id; as the business goes global, the entire stack is cloned into additional regions with cross-region replication providing disaster recovery. Observability is a first-class citizen from day one: structured logs to a central pipeline, RED-style metrics scraped per instance, and a health endpoint the load balancer polls to eject bad pods within seconds.

Architecture Components (11)

Operations Walked Through (3)

Implementation

Spring Boot application skeleton
package com.example.app;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cache.annotation.EnableCaching;
import org.springframework.scheduling.annotation.EnableAsync;

@SpringBootApplication
@EnableCaching
@EnableAsync
public class WebApp {
    public static void main(String[] args) {
        // -Dspring.profiles.active=prod selects the multi-region config
        SpringApplication.run(WebApp.class, args);
    }
}
HealthController with liveness and readiness
package com.example.app.health;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.util.concurrent.atomic.AtomicBoolean;

@RestController
@RequestMapping("/health")
public class HealthController {
    // liveness flips to false only on an unrecoverable internal error
    private final AtomicBoolean alive = new AtomicBoolean(true);
    // readiness flips to false during warm-up or while draining on shutdown
    private final AtomicBoolean ready = new AtomicBoolean(false);

    @GetMapping("/live")
    public ResponseEntity<String> live() {
        return alive.get() ? ResponseEntity.ok("OK") : ResponseEntity.status(500).body("DEAD");
    }

    @GetMapping("/ready")
    public ResponseEntity<String> ready() {
        return ready.get() ? ResponseEntity.ok("READY") : ResponseEntity.status(503).body("WARMING");
    }

    public void markReady()   { ready.set(true); }
    public void beginDrain()  { ready.set(false); } // LB stops sending new traffic
    public void fatalError()  { alive.set(false); } // orchestrator restarts the pod
}
Cache-aside service with Redis
package com.example.app.user;

import org.springframework.data.redis.core.StringRedisTemplate;
import org.springframework.stereotype.Service;
import java.time.Duration;

@Service
public class UserProfileService {
    private static final Duration TTL = Duration.ofMinutes(10);
    private final StringRedisTemplate redis;
    private final UserRepository repo;
    private final ObjectMapper json = new ObjectMapper();

    public UserProfileService(StringRedisTemplate redis, UserRepository repo) {
        this.redis = redis;
        this.repo = repo;
    }

    public UserProfile get(long userId) throws Exception {
        String key = "user:" + userId;
        String cached = redis.opsForValue().get(key);
        if (cached != null) return json.readValue(cached, UserProfile.class); // hit

        UserProfile fresh = repo.findById(userId);              // miss -> DB
        if (fresh != null) {
            redis.opsForValue().set(key, json.writeValueAsString(fresh), TTL);
        }
        return fresh;
    }

    public void update(UserProfile p) {
        repo.save(p);
        redis.delete("user:" + p.getId()); // invalidate; next read repopulates
    }
}
Read-replica DataSource routing
package com.example.app.db;

import org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource;
import javax.sql.DataSource;
import java.util.Map;

public class ReadWriteRoutingDataSource extends AbstractRoutingDataSource {
    public enum Route { MASTER, REPLICA }
    private static final ThreadLocal<Route> CTX = ThreadLocal.withInitial(() -> Route.MASTER);

    public static void useReplica() { CTX.set(Route.REPLICA); }
    public static void useMaster()  { CTX.set(Route.MASTER); }
    public static void clear()      { CTX.remove(); }

    @Override
    protected Object determineCurrentLookupKey() { return CTX.get(); }

    public static ReadWriteRoutingDataSource build(DataSource master, DataSource replica) {
        ReadWriteRoutingDataSource ds = new ReadWriteRoutingDataSource();
        ds.setTargetDataSources(Map.of(Route.MASTER, master, Route.REPLICA, replica));
        ds.setDefaultTargetDataSource(master);
        ds.afterPropertiesSet();
        return ds;
    }
}

// @Transactional(readOnly=true) methods call useReplica() in an aspect;
// writes always hit MASTER. Replica lag is accepted on the read path.

Key design decisions & trade-offs

Interview follow-ups

Related