← System Design Simulator

Distributed Email Service (Gmail-style) System Design Interview Question

By Rahul Kumar · Senior Software Engineer · Updated · 11 components · 3 operations ·Source: Alex Xu, System Design Interview Vol 2, Chapter 8

Problem: Design a distributed email service like Gmail supporting SMTP send/receive, spam filtering, full-text search, and attachments.

Overview

A Gmail-scale email service has two personalities. On the write side it is an SMTP system that accepts RFC 5321 envelopes, runs spam and malware filters, and persists messages durably. On the read side it is a metadata-heavy database with a search index: users scroll headers roughly one hundred times more often than they open bodies. The classic mistake is treating an email as one blob and shoving it into a single store. The right split is three-way: compact metadata rows in a sharded SQL or wide-column database for inbox views, opaque message bodies in a content-addressed KV store for lazy loading, and attachments in a blob store with content hashing so the same PDF shared in a thread is stored once. The interesting tradeoffs are how to keep the search index within a few seconds of the source of truth, whether to filter spam synchronously at SMTP time or asynchronously, and how to partition mailboxes so hot users do not melt a single shard.

Distributed Email Service (Gmail-style) — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive walkthrough for Distributed Email Service (Gmail-style) — animated architecture diagram, step-by-step flow with real payloads, component swap, and a discrete-event stress simulator.

Summary

A hybrid write + read system where email is ingested over SMTP (inbound MX + outbound MTA), filtered for spam/malware, split into metadata (indexed, searchable) and body/attachment blobs (cheap storage), and surfaced to clients over HTTP/IMAP. The dominant design choice is separating metadata from message bodies: metadata lives in a sharded SQL/NoSQL store with a secondary search index, while bodies land in a KV message store and attachments in a blob store (content-addressed to dedupe). The main tradeoff is eventual consistency between the metadata DB and the search index — users might send a mail that isn't searchable for a few seconds. Sized for ~1B users and ~200B messages/day (Gmail scale).

Requirements

Functional

Non-functional

Capacity Assumptions

Back-of-Envelope Estimates

High-level architecture

The write path starts at the Inbound MX Receiver, an SMTP server that terminates the RCPT TO / DATA dialog, performs SPF, DKIM, and DMARC checks, and hands the accepted message to a Spam Filter. The filter is pipelined with the rest of ingestion as an interceptor rather than a synchronous gate, because spam classifiers occasionally mispredict and we want to accept-and-hide rather than bounce-and-lose. Accepted messages are split: metadata (sender, recipient, subject, flags, snippet) is written to the Inbox Metadata DB sharded by user partition, the body goes to the Message Body KV keyed by message-id, and attachments are uploaded to the Attachment Blob Store keyed by SHA-256. A CDC pipeline tails the metadata DB and pushes updates to the Search Index and to the Push Notification Service, which delivers to APNs, FCM, or a WebSocket fanout. Outbound mail from first-party clients hits the Mail Submission API, is filtered for outbound spam and malware, queued in the Outbound SMTP Relay, and retried against peer MTAs. The read path is dominated by the inbox-list call, which serves metadata-only responses (headers plus a 150-character snippet) from a read replica of the metadata shard; bodies are fetched only when the user opens a message. Partitioning the mailbox by user_id is the single most important decision: no cross-partition query is on the hot path, and hot users get isolated blast radius.

Architecture Components (11)

Operations Walked Through (3)

Implementation

Message model
public final class Message {
  public final String messageId;          // RFC 5322 Message-ID
  public final String userId;             // mailbox owner (partition key)
  public final String from;
  public final List<String> to;
  public final List<String> cc;
  public final String subject;
  public final Instant receivedAt;
  public final long sizeBytes;
  public final String bodyRef;            // KV key -> Message Body store
  public final List<AttachmentRef> attachments;
  public final Set<String> labels;        // INBOX, STARRED, SPAM, TRASH
  public final int spamScore;             // 0..100

  public Message(String messageId, String userId, String from, List<String> to,
                 List<String> cc, String subject, Instant receivedAt, long sizeBytes,
                 String bodyRef, List<AttachmentRef> attachments,
                 Set<String> labels, int spamScore) {
    this.messageId = messageId;
    this.userId = userId;
    this.from = from;
    this.to = List.copyOf(to);
    this.cc = List.copyOf(cc);
    this.subject = subject;
    this.receivedAt = receivedAt;
    this.sizeBytes = sizeBytes;
    this.bodyRef = bodyRef;
    this.attachments = List.copyOf(attachments);
    this.labels = Set.copyOf(labels);
    this.spamScore = spamScore;
  }

  public record AttachmentRef(String sha256, String filename, String mime, long bytes) {}
}
SMTP server stub (RCPT TO / DATA)
public final class SmtpServer {
  private final MailBoxService mailbox;
  private final SpamInterceptor spam;
  private final DmarcVerifier dmarc;

  public SmtpServer(MailBoxService mailbox, SpamInterceptor spam, DmarcVerifier dmarc) {
    this.mailbox = mailbox; this.spam = spam; this.dmarc = dmarc;
  }

  // Invoked by the server loop on each new connection.
  public void handle(SmtpConnection c) throws IOException {
    c.write("220 mx.hldsim.org ESMTP ready");
    SmtpSession s = new SmtpSession();
    String line;
    while ((line = c.readLine()) != null) {
      if (line.startsWith("MAIL FROM:")) {
        s.setFrom(parseAddress(line.substring(10)));
        c.write("250 OK");
      } else if (line.startsWith("RCPT TO:")) {
        String rcpt = parseAddress(line.substring(8));
        if (!mailbox.userExists(rcpt)) { c.write("550 no such user"); continue; }
        s.addRecipient(rcpt);
        c.write("250 OK");
      } else if (line.equals("DATA")) {
        c.write("354 end data with <CR><LF>.<CR><LF>");
        byte[] body = c.readUntilDotLine();
        if (!dmarc.verify(s.from(), body)) { c.write("550 DMARC fail"); continue; }
        Message m = Message.parse(s, body);
        Message scored = spam.score(m);
        mailbox.deliver(scored);
        c.write("250 2.0.0 queued as " + m.messageId);
      } else if (line.equals("QUIT")) {
        c.write("221 bye"); return;
      }
    }
  }

  private static String parseAddress(String in) { return in.replaceAll("[<>\\s]", ""); }
}
MailBox service partitioned by user
public final class MailBoxService {
  private final MetadataRepo metadata;   // sharded by userId
  private final BodyStore bodies;        // content-addressed KV
  private final BlobStore attachments;   // content-addressed blobs
  private final Indexer indexer;         // CDC sink into search index

  public MailBoxService(MetadataRepo m, BodyStore b, BlobStore a, Indexer i) {
    this.metadata = m; this.bodies = b; this.attachments = a; this.indexer = i;
  }

  public void deliver(Message m) {
    String bodyRef = bodies.put(m.messageId, m.body());      // dedup by message-id
    for (var att : m.attachments) attachments.put(att.sha256(), att);
    Message stored = m.withBodyRef(bodyRef);
    metadata.insert(partitionKey(stored.userId), stored);     // per-user shard
    indexer.enqueue(stored);                                  // async -> search
  }

  public List<Message> listInbox(String userId, int limit, String cursor) {
    return metadata.listByLabel(partitionKey(userId), userId, "INBOX", limit, cursor);
  }

  public boolean userExists(String email) { return metadata.userExists(email); }

  private int partitionKey(String userId) {
    return Math.floorMod(userId.hashCode(), metadata.shardCount());
  }
}
Spam-score interceptor
public final class SpamInterceptor {
  private final BayesClassifier bayes;
  private final RbldnsClient rbl;
  private final UrlReputation urlRep;

  public SpamInterceptor(BayesClassifier b, RbldnsClient r, UrlReputation u) {
    this.bayes = b; this.rbl = r; this.urlRep = u;
  }

  public Message score(Message m) {
    int s = 0;
    if (rbl.listed(m.senderIp())) s += 40;
    if (urlRep.anyBad(m.extractLinks())) s += 30;
    s += bayes.score(m.subject + " \n " + m.snippet());
    int clamped = Math.min(100, Math.max(0, s));
    Set<String> labels = new HashSet<>(m.labels);
    if (clamped >= 80) { labels.remove("INBOX"); labels.add("SPAM"); }
    return m.withLabelsAndSpamScore(labels, clamped);
  }
}

Key design decisions & trade-offs

Interview follow-ups

Related