Distributed Email Service (Gmail-style) System Design Interview Question
Problem: Design a distributed email service like Gmail supporting SMTP send/receive, spam filtering, full-text search, and attachments.
Overview
A Gmail-scale email service has two personalities. On the write side it is an SMTP system that accepts RFC 5321 envelopes, runs spam and malware filters, and persists messages durably. On the read side it is a metadata-heavy database with a search index: users scroll headers roughly one hundred times more often than they open bodies. The classic mistake is treating an email as one blob and shoving it into a single store. The right split is three-way: compact metadata rows in a sharded SQL or wide-column database for inbox views, opaque message bodies in a content-addressed KV store for lazy loading, and attachments in a blob store with content hashing so the same PDF shared in a thread is stored once. The interesting tradeoffs are how to keep the search index within a few seconds of the source of truth, whether to filter spam synchronously at SMTP time or asynchronously, and how to partition mailboxes so hot users do not melt a single shard.
Summary
A hybrid write + read system where email is ingested over SMTP (inbound MX + outbound MTA), filtered for spam/malware, split into metadata (indexed, searchable) and body/attachment blobs (cheap storage), and surfaced to clients over HTTP/IMAP. The dominant design choice is separating metadata from message bodies: metadata lives in a sharded SQL/NoSQL store with a secondary search index, while bodies land in a KV message store and attachments in a blob store (content-addressed to dedupe). The main tradeoff is eventual consistency between the metadata DB and the search index — users might send a mail that isn't searchable for a few seconds. Sized for ~1B users and ~200B messages/day (Gmail scale).
Requirements
Functional
- Accept inbound SMTP on port 25 and apply spam, virus, and DMARC checks
- Send outbound mail to peer MTAs with retry on 4xx temporary failures
- Store messages split by metadata, body, and attachments; lazy-load bodies
- Expose HTTP and IMAP APIs for list, fetch, search, label, and delete
- Full-text search across subject, body, and attachment text within ~5 s freshness
- Push new-mail notifications to connected clients in real time
Non-functional
- Support 1B DAU and 200B messages/day with ~115K write QPS and ~460K read QPS
- Inbox-list p95 latency under 50 ms
- Durability: no accepted message is ever lost, stored in at least three replicas
- Attachment upload resumable up to 25 MB per file
- Search index staleness under 5 s at p99
Capacity Assumptions
- 1B DAU, 10 sends + 40 receives per user per day
- Average email 50 KB text, 10% carry attachments averaging 500 KB
- Spam rate ~50% at the MX edge (most dropped pre-filter, ~10% reach inbox filter)
- Retention: forever (free-tier 15 GB quota)
- Search-index freshness target: < 5 seconds p99
Back-of-Envelope Estimates
- Writes (sends): 1B * 10 / 86400 ≈ 115K QPS (peak ~350K)
- Reads (inbox fetch + search): 1B * 40 / 86400 ≈ 460K QPS (peak ~1.4M)
- Metadata storage: 200B msgs/day * 1KB meta * 365d ≈ 73 PB/year
- Body storage: 200B msgs/day * 50KB / 10 (dedupe + compress) ≈ 1 PB/day
- Attachment storage: 200B * 0.1 * 500KB ≈ 10 PB/day raw, ~3 PB after content-addressed dedupe
High-level architecture
The write path starts at the Inbound MX Receiver, an SMTP server that terminates the RCPT TO / DATA dialog, performs SPF, DKIM, and DMARC checks, and hands the accepted message to a Spam Filter. The filter is pipelined with the rest of ingestion as an interceptor rather than a synchronous gate, because spam classifiers occasionally mispredict and we want to accept-and-hide rather than bounce-and-lose. Accepted messages are split: metadata (sender, recipient, subject, flags, snippet) is written to the Inbox Metadata DB sharded by user partition, the body goes to the Message Body KV keyed by message-id, and attachments are uploaded to the Attachment Blob Store keyed by SHA-256. A CDC pipeline tails the metadata DB and pushes updates to the Search Index and to the Push Notification Service, which delivers to APNs, FCM, or a WebSocket fanout. Outbound mail from first-party clients hits the Mail Submission API, is filtered for outbound spam and malware, queued in the Outbound SMTP Relay, and retried against peer MTAs. The read path is dominated by the inbox-list call, which serves metadata-only responses (headers plus a 150-character snippet) from a read replica of the metadata shard; bodies are fetched only when the user opens a message. Partitioning the mailbox by user_id is the single most important decision: no cross-partition query is on the hot path, and hot users get isolated blast radius.
Architecture Components (11)
- Client (Web / Mobile / IMAP) (client) — Gmail web app, mobile app, or IMAP client that composes, lists, and searches mail.
- Load Balancer (lb) — L7 HTTPS load balancer for the mail submission API tier.
- Mail Submission API (api) — Stateless service handling send, list, read, search, and attachment upload coordination.
- Spam / Anti-abuse Filter (worker) — Pipeline that scores every inbound and outbound message for spam, phish, malware.
- Outbound SMTP Relay (MTA) (worker) — Opens SMTP connections to remote mail servers and delivers outbound messages.
- Inbound MX Receiver (api) — SMTP server advertised in public MX records that accepts mail from the internet.
- Inbox Metadata DB (nosql) — Per-user sharded store of message metadata: headers, labels, flags, references.
- Message Body KV (kv) — Content-addressed KV store holding the full MIME body of each message.
- Attachment Blob Store (blob) — Content-addressed S3-style store for attachments, deduplicated across users.
- Search Index (search) — Elasticsearch cluster indexing headers + body text for `search mail` queries.
- Push Notification Service (worker) — Fan-out service that wakes mobile/web clients when a new message lands.
Operations Walked Through (3)
- send — User clicks Send. API validates, attachments are stored, spam filter scans outbound, MTA delivers via SMTP to the remote domain.
- receive — Remote server delivers to our MX. Filter scores, metadata+body+attachments persist, recipient's inbox index updates, push notification fires.
- search — User types a query; API hits the search index filtered to their user id, hydrates hits with metadata, returns a ranked list.
Implementation
public final class Message {
public final String messageId; // RFC 5322 Message-ID
public final String userId; // mailbox owner (partition key)
public final String from;
public final List<String> to;
public final List<String> cc;
public final String subject;
public final Instant receivedAt;
public final long sizeBytes;
public final String bodyRef; // KV key -> Message Body store
public final List<AttachmentRef> attachments;
public final Set<String> labels; // INBOX, STARRED, SPAM, TRASH
public final int spamScore; // 0..100
public Message(String messageId, String userId, String from, List<String> to,
List<String> cc, String subject, Instant receivedAt, long sizeBytes,
String bodyRef, List<AttachmentRef> attachments,
Set<String> labels, int spamScore) {
this.messageId = messageId;
this.userId = userId;
this.from = from;
this.to = List.copyOf(to);
this.cc = List.copyOf(cc);
this.subject = subject;
this.receivedAt = receivedAt;
this.sizeBytes = sizeBytes;
this.bodyRef = bodyRef;
this.attachments = List.copyOf(attachments);
this.labels = Set.copyOf(labels);
this.spamScore = spamScore;
}
public record AttachmentRef(String sha256, String filename, String mime, long bytes) {}
}
public final class SmtpServer {
private final MailBoxService mailbox;
private final SpamInterceptor spam;
private final DmarcVerifier dmarc;
public SmtpServer(MailBoxService mailbox, SpamInterceptor spam, DmarcVerifier dmarc) {
this.mailbox = mailbox; this.spam = spam; this.dmarc = dmarc;
}
// Invoked by the server loop on each new connection.
public void handle(SmtpConnection c) throws IOException {
c.write("220 mx.hldsim.org ESMTP ready");
SmtpSession s = new SmtpSession();
String line;
while ((line = c.readLine()) != null) {
if (line.startsWith("MAIL FROM:")) {
s.setFrom(parseAddress(line.substring(10)));
c.write("250 OK");
} else if (line.startsWith("RCPT TO:")) {
String rcpt = parseAddress(line.substring(8));
if (!mailbox.userExists(rcpt)) { c.write("550 no such user"); continue; }
s.addRecipient(rcpt);
c.write("250 OK");
} else if (line.equals("DATA")) {
c.write("354 end data with <CR><LF>.<CR><LF>");
byte[] body = c.readUntilDotLine();
if (!dmarc.verify(s.from(), body)) { c.write("550 DMARC fail"); continue; }
Message m = Message.parse(s, body);
Message scored = spam.score(m);
mailbox.deliver(scored);
c.write("250 2.0.0 queued as " + m.messageId);
} else if (line.equals("QUIT")) {
c.write("221 bye"); return;
}
}
}
private static String parseAddress(String in) { return in.replaceAll("[<>\\s]", ""); }
}
public final class MailBoxService {
private final MetadataRepo metadata; // sharded by userId
private final BodyStore bodies; // content-addressed KV
private final BlobStore attachments; // content-addressed blobs
private final Indexer indexer; // CDC sink into search index
public MailBoxService(MetadataRepo m, BodyStore b, BlobStore a, Indexer i) {
this.metadata = m; this.bodies = b; this.attachments = a; this.indexer = i;
}
public void deliver(Message m) {
String bodyRef = bodies.put(m.messageId, m.body()); // dedup by message-id
for (var att : m.attachments) attachments.put(att.sha256(), att);
Message stored = m.withBodyRef(bodyRef);
metadata.insert(partitionKey(stored.userId), stored); // per-user shard
indexer.enqueue(stored); // async -> search
}
public List<Message> listInbox(String userId, int limit, String cursor) {
return metadata.listByLabel(partitionKey(userId), userId, "INBOX", limit, cursor);
}
public boolean userExists(String email) { return metadata.userExists(email); }
private int partitionKey(String userId) {
return Math.floorMod(userId.hashCode(), metadata.shardCount());
}
}
public final class SpamInterceptor {
private final BayesClassifier bayes;
private final RbldnsClient rbl;
private final UrlReputation urlRep;
public SpamInterceptor(BayesClassifier b, RbldnsClient r, UrlReputation u) {
this.bayes = b; this.rbl = r; this.urlRep = u;
}
public Message score(Message m) {
int s = 0;
if (rbl.listed(m.senderIp())) s += 40;
if (urlRep.anyBad(m.extractLinks())) s += 30;
s += bayes.score(m.subject + " \n " + m.snippet());
int clamped = Math.min(100, Math.max(0, s));
Set<String> labels = new HashSet<>(m.labels);
if (clamped >= 80) { labels.remove("INBOX"); labels.add("SPAM"); }
return m.withLabelsAndSpamScore(labels, clamped);
}
}
Key design decisions & trade-offs
- Synchronous spam rejection at SMTP vs post-accept filtering — Chosen: Accept-and-tag with a spam score, move to SPAM label if over threshold. Rejecting at SMTP time sends a permanent failure that senders see as a bounce. False positives are catastrophic for legitimate mail; accepting and hiding is reversible and gives users a Spam folder to audit.
- Metadata and body in the same row vs split storage — Chosen: Split: metadata in SQL, body in content-addressed KV, attachments in blob store. Users scroll metadata ~100x more than they open bodies. A 50 KB body in the metadata row would multiply read bandwidth by 100. Splitting keeps inbox-list p95 at ~50 ms and lets attachments dedupe across recipients.
- Search index consistency with metadata — Chosen: Async CDC pipeline with ~5 s target staleness. A strongly consistent inline index would couple write latency to the slowest index shard and is not worth it for a feature where 5 s staleness is imperceptible. CDC also lets us rebuild the index from the source of truth on schema changes.
- Mailbox partition key — Chosen: Shard by user_id (hash) with per-user locality. All inbox operations are per-user, so a hash on user_id gives local reads and writes, no cross-shard queries on the hot path, and even distribution. Hot celebrity users can be isolated onto dedicated shards.
- Push delivery: persistent WebSocket vs mobile push gateway — Chosen: WebSocket for web, APNs/FCM for mobile. Mobile OSes kill background sockets within seconds; a native push gateway is the only reliable way to reach a backgrounded app. Web apps stay with WebSockets because they are foregrounded when it matters.
Interview follow-ups
- How do you handle a celebrity mailbox that gets 10k messages an hour without melting its shard?
- How do you implement undo-send with a 30-second delay while still meeting the send SLA?
- How would you design E2E encryption and preserve server-side search?
- How do you garbage-collect orphaned attachments in the content-addressed blob store?
- How would you migrate a user's mailbox to a different shard with zero downtime?