← System Design Simulator

Search Autocomplete (Type-Ahead) System Design Interview Question

By Rahul Kumar · Senior Software Engineer · Updated · 11 components · 5 operations ·Source: Alex Xu, System Design Interview Vol 1, Chapter 13

Problem: Design a top-k search autocomplete system that returns the 5 most popular queries for a typed prefix within 100ms.

Overview

Search autocomplete (type-ahead) is the feature that turns every keystroke into a backend request and expects the top-5 completions to arrive within roughly 100 milliseconds. The workload is brutally read-heavy: at 10 million daily actives typing 20-character queries with one request per keystroke, the serving tier must sustain around 24K QPS average and roughly 48K QPS at peak, while still looking instant from the user's perspective. The design we describe here accepts a one-week-old view of the world in exchange for drastically simpler serving. A trie with the top-K completions precomputed at every node lets the query tier answer any prefix in O(P) where P is the prefix length, which easily fits a 100ms budget even on commodity hardware. A separate offline pipeline aggregates query logs, re-ranks by frequency, and rebuilds the trie on a cadence the business can tolerate. The end result is a system where writes are large and slow but infrequent, and reads are small and fast and constant.

Search Autocomplete (Type-Ahead) — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive walkthrough for Search Autocomplete (Type-Ahead) — animated architecture diagram, step-by-step flow with real payloads, component swap, and a discrete-event stress simulator.

Summary

A read-dominated system for 10M DAU that returns top-5 completions for each keystroke under 100ms (book: Facebook typeahead SLO). The dominant design choice is a trie with top-k precomputed at every node (serving is O(1) after the prefix walk), populated by an offline batch pipeline (analytics logs → aggregator → worker builds trie → Trie Cache + Trie DB) that rebuilds weekly. Online path: Client → LB → API → Trie Cache (with fallback to Trie DB). The main tradeoff is freshness vs serving simplicity: an in-memory trie is blazing fast to read but rebuild cadence (weekly in the book's base design) means trending queries lag until the next cycle — the book's 'how to support real-time' follow-up outlines stream processing as the fix. A filter layer in front of the cache removes hateful/violent/NSFW suggestions; a shard-map manager handles prefix-shard imbalance.

Requirements

Functional

Non-functional

Capacity Assumptions

Back-of-Envelope Estimates

High-level architecture

The serving path is intentionally short. A client fires an AJAX GET on every keystroke, hits an L7 load balancer, and lands on a stateless API replica. The API reads from an in-memory Trie Cache that stores, at every node, the precomputed top-K completions for the prefix ending at that node. That caching is the whole trick: walking the prefix is O(P), and once we hit the target node the answer is already materialized, so no sorting or scanning happens on the hot path. If the cache is cold, the API falls back to a durable Trie DB that holds the same structure serialized. A Filter Layer strips disallowed phrases before the suggestions leave the building, and a Shard Map Manager keeps prefix shards balanced as popular prefixes like "how to" drift hotter over time. The build path is completely separate. Analytics logs stream into aggregators that compute per-query frequency over the rebuild window. A pool of Trie Build Workers consumes that aggregated data and produces a new trie, top-K annotated at every node, which is then atomically swapped into the Trie DB and warmed into the Trie Cache. The rebuild can run weekly in the base design; moving to minute-level freshness requires a streaming pipeline, covered as a follow-up. The clean split between read-optimized in-memory serving and write-batched offline builds is what lets a single team operate this system without 24/7 incidents.

Architecture Components (11)

Operations Walked Through (5)

Implementation

Trie with top-K cached at every node
public class AutocompleteTrie {
    private final int topK;
    private final Node root = new Node();

    public AutocompleteTrie(int topK) { this.topK = topK; }

    static final class Node {
        final Map<Character, Node> children = new HashMap<>();
        // Top-K completions that pass through this node, sorted desc by frequency.
        List<Suggestion> topK = Collections.emptyList();
    }

    static final class Suggestion implements Comparable<Suggestion> {
        final String query; final long freq;
        Suggestion(String q, long f) { this.query = q; this.freq = f; }
        public int compareTo(Suggestion o) { return Long.compare(o.freq, this.freq); }
    }

    /** Offline: insert every known query and update top-K on every node along the path. */
    public void insert(String query, long freq) {
        Node cur = root;
        List<Node> path = new ArrayList<>(query.length());
        for (char c : query.toCharArray()) {
            cur = cur.children.computeIfAbsent(c, k -> new Node());
            path.add(cur);
        }
        for (Node n : path) n.topK = mergeTopK(n.topK, new Suggestion(query, freq));
    }

    /** Online: O(P) prefix walk, then O(1) read of the precomputed top-K. */
    public List<Suggestion> suggest(String prefix) {
        Node cur = root;
        for (char c : prefix.toCharArray()) {
            cur = cur.children.get(c);
            if (cur == null) return Collections.emptyList();
        }
        return cur.topK;
    }

    private List<Suggestion> mergeTopK(List<Suggestion> existing, Suggestion candidate) {
        List<Suggestion> merged = new ArrayList<>(existing);
        merged.removeIf(s -> s.query.equals(candidate.query));
        merged.add(candidate);
        Collections.sort(merged);
        return merged.subList(0, Math.min(topK, merged.size()));
    }
}
PrefixController with 100ms SLO
@RestController
public class PrefixController {
    private static final Duration SLO = Duration.ofMillis(100);
    private final AutocompleteTrie cache;
    private final TrieDb fallback;
    private final PhraseFilter filter;

    public PrefixController(AutocompleteTrie cache, TrieDb db, PhraseFilter filter) {
        this.cache = cache; this.fallback = db; this.filter = filter;
    }

    @GetMapping("/search")
    public ResponseEntity<List<String>> suggest(@RequestParam("q") String raw) {
        long start = System.nanoTime();
        String prefix = raw == null ? "" : raw.trim().toLowerCase(Locale.ROOT);
        if (prefix.isEmpty() || prefix.length() > 50) return ResponseEntity.ok(List.of());

        List<AutocompleteTrie.Suggestion> hit = cache.suggest(prefix);
        if (hit.isEmpty() && elapsed(start).compareTo(SLO.dividedBy(2)) < 0) {
            hit = fallback.suggest(prefix); // only try the DB if we still have budget
        }

        List<String> out = hit.stream()
                .map(s -> s.query)
                .filter(filter::allow)
                .limit(5)
                .toList();

        HttpHeaders h = new HttpHeaders();
        h.setCacheControl("private, max-age=3600");
        h.add("X-Elapsed-Ms", Long.toString(elapsed(start).toMillis()));
        return new ResponseEntity<>(out, h, HttpStatus.OK);
    }

    private Duration elapsed(long startNanos) {
        return Duration.ofNanos(System.nanoTime() - startNanos);
    }
}
Daily batch rebuild from query logs
public class TrieRebuildJob {
    private final QueryLogReader logs;
    private final TrieDb trieDb;
    private final TrieCacheWarmer warmer;

    public TrieRebuildJob(QueryLogReader logs, TrieDb trieDb, TrieCacheWarmer warmer) {
        this.logs = logs; this.trieDb = trieDb; this.warmer = warmer;
    }

    /** Nightly: aggregate -> build -> swap -> warm. */
    public void runFor(LocalDate day) {
        // 1. aggregate frequencies across the trailing window
        Map<String, Long> freq = new HashMap<>();
        for (QueryLogReader.Row r : logs.rowsFor(day.minusDays(7), day)) {
            freq.merge(r.normalizedQuery(), r.count(), Long::sum);
        }

        // 2. build the new trie with top-K precomputed at every node
        AutocompleteTrie next = new AutocompleteTrie(5);
        freq.entrySet().stream()
            .filter(e -> e.getValue() >= 3) // drop long-tail noise
            .forEach(e -> next.insert(e.getKey(), e.getValue()));

        // 3. atomic swap in the durable store and warm the read-tier cache
        String version = "v-" + day;
        trieDb.publish(version, next);
        warmer.warmAllReplicas(version);
    }
}

Key design decisions & trade-offs

Interview follow-ups

Related