← System Design Simulator

WhatsApp-style Messaging System Design Interview Question

By Rahul Kumar · Senior Software Engineer · Updated · 12 components · 6 operations ·Source: Alex Xu, System Design Interview Vol 1, Chapter 12

Problem: Design a 1-to-1 and small group chat system (up to 100 members) with online presence, multi-device sync, and push notifications.

Overview

WhatsApp-style messaging targets 50 billion messages per day with sub-second delivery, multi-device sync, and presence, all while respecting mobile battery and flaky networks. The problem is fundamentally different from request/response web apps: the server has to push to the client, not the other way around, so a persistent bidirectional channel is mandatory. The design is shaped by three constants the book fixes: 50M daily active users, groups capped at 100 members, and chat history retained forever. Those numbers drive roughly 10M concurrent WebSocket connections, 150 TB of message storage per year, and a fleet of about 200 chat servers each holding 50K live sockets. Getting the message path right matters because every design tradeoff — protocol choice, storage engine, fanout model, offline handling — cascades from the fact that users expect delivery to feel instant even when the recipient is offline, on airplane mode, or roaming through a carrier NAT.

WhatsApp-style Messaging — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive walkthrough for WhatsApp-style Messaging — animated architecture diagram, step-by-step flow with real payloads, component swap, and a discrete-event stress simulator.

Summary

A real-time messaging system for 50M DAU, built around three tiers: (1) stateless API servers for signup/login/profile, (2) stateful chat servers holding long-lived WebSocket connections for real-time send/receive, (3) third-party push-notification integration for offline delivery. Service discovery (ZooKeeper) picks the best chat server for each client at login. Chat history lives in a key-value store (HBase / Cassandra) keyed by channel_id. The dominant design choice is WebSocket for bidirectional traffic (vs HTTP polling or long-polling) — it is the only protocol that lets the server push to the client cheaply over a persistent connection; the main tradeoff is that connection affinity makes rolling deploys and failovers harder than a stateless tier, which is why ZooKeeper coordinates chat-server health and capacity.

Requirements

Functional

Non-functional

Capacity Assumptions

Back-of-Envelope Estimates

High-level architecture

Clients open a long-lived WebSocket (WSS) to a chat server chosen by ZooKeeper-backed service discovery at login; all short, stateless calls (signup, profile, friends, group CRUD) hit a separate stateless API tier fronted by the same load balancer. The split matters because the chat tier is sticky — once a socket lives on chat-server-42, breaking affinity means redialing and re-authenticating — whereas the API tier can round-robin freely and scale purely on CPU. When user A sends a message, the chat server assigns a globally unique message_id from a Snowflake-style ID generator, appends the row to a Cassandra/HBase table keyed by channel_id (so a conversation's history is one contiguous partition), and then consults the message sync queue to find where each recipient device currently lives. For each online device, the fanout pushes over its WebSocket; for each offline device, the message is staged in that user's inbox and an APNs/FCM push is emitted through the notification service to wake the app. Presence piggybacks on the heartbeat: clients send a keepalive every 5 s, and the presence service flips a user offline after a 30 s silence window so a brief tunnel or NAT rebind does not spam friends with status churn. Consistent hashing on user_id keeps a user's connections, inbox queue, and message shards co-located, which is what makes fanout cheap even at groups of 100.

Architecture Components (12)

Operations Walked Through (6)

Implementation

ChatMessage domain model
package com.systemdesign.whatsapp.model;

import java.time.Instant;
import java.util.UUID;

public final class ChatMessage {
    private final long messageId;           // snowflake id
    private final String channelId;         // 1:1 or group
    private final long senderId;
    private final String body;              // up to 100_000 chars
    private final Instant createdAt;
    private final String clientDedupeKey;   // uuid from client

    public ChatMessage(long messageId, String channelId, long senderId,
                       String body, Instant createdAt, String clientDedupeKey) {
        if (body != null && body.length() > 100_000) {
            throw new IllegalArgumentException("body exceeds 100k chars");
        }
        this.messageId = messageId;
        this.channelId = channelId;
        this.senderId = senderId;
        this.body = body;
        this.createdAt = createdAt;
        this.clientDedupeKey = clientDedupeKey == null
                ? UUID.randomUUID().toString()
                : clientDedupeKey;
    }

    public long getMessageId() { return messageId; }
    public String getChannelId() { return channelId; }
    public long getSenderId() { return senderId; }
    public String getBody() { return body; }
    public Instant getCreatedAt() { return createdAt; }
    public String getClientDedupeKey() { return clientDedupeKey; }
}
MessageDelivery — fanout to N recipient devices
package com.systemdesign.whatsapp.delivery;

import com.systemdesign.whatsapp.model.ChatMessage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
public class MessageDelivery {
    private static final Logger log = LoggerFactory.getLogger(MessageDelivery.class);

    private final DeviceRegistry deviceRegistry;         // user_id -> List<Device>
    private final WebSocketHandler wsHandler;            // local pushes
    private final OfflineInboxQueue offlineInbox;        // durable queue per user
    private final PushNotificationClient apns;           // APNs/FCM

    @Autowired
    public MessageDelivery(DeviceRegistry deviceRegistry,
                           WebSocketHandler wsHandler,
                           OfflineInboxQueue offlineInbox,
                           PushNotificationClient apns) {
        this.deviceRegistry = deviceRegistry;
        this.wsHandler = wsHandler;
        this.offlineInbox = offlineInbox;
        this.apns = apns;
    }

    public void deliver(ChatMessage msg, List<Long> recipientUserIds) {
        for (Long uid : recipientUserIds) {
            List<Device> devices = deviceRegistry.lookup(uid);
            for (Device d : devices) {
                if (d.isOnline()) {
                    boolean pushed = wsHandler.pushToDevice(d.getDeviceId(), msg);
                    if (!pushed) {
                        // socket dropped mid-flight, fall through to offline path
                        offlineInbox.enqueue(uid, d.getDeviceId(), msg);
                    }
                } else {
                    offlineInbox.enqueue(uid, d.getDeviceId(), msg);
                    apns.wake(d.getPushToken(), msg.getChannelId());
                }
            }
        }
        log.info("delivered messageId={} to {} recipients", msg.getMessageId(), recipientUserIds.size());
    }
}
ChatWebSocketHandler — Spring WebSocket push and offline queue drain
package com.systemdesign.whatsapp.ws;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.systemdesign.whatsapp.delivery.OfflineInboxQueue;
import com.systemdesign.whatsapp.model.ChatMessage;
import org.springframework.web.socket.CloseStatus;
import org.springframework.web.socket.TextMessage;
import org.springframework.web.socket.WebSocketSession;
import org.springframework.web.socket.handler.TextWebSocketHandler;
import org.springframework.stereotype.Component;

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

@Component
public class ChatWebSocketHandler extends TextWebSocketHandler {

    private final ConcurrentMap<String, WebSocketSession> deviceIdToSession = new ConcurrentHashMap<>();
    private final ObjectMapper mapper = new ObjectMapper();
    private final OfflineInboxQueue offlineInbox;

    public ChatWebSocketHandler(OfflineInboxQueue offlineInbox) {
        this.offlineInbox = offlineInbox;
    }

    @Override
    public void afterConnectionEstablished(WebSocketSession session) throws Exception {
        String deviceId = (String) session.getAttributes().get("deviceId");
        long userId = (Long) session.getAttributes().get("userId");
        deviceIdToSession.put(deviceId, session);
        // drain anything staged while offline
        offlineInbox.drain(userId, deviceId, msg -> pushToDevice(deviceId, msg));
    }

    @Override
    public void afterConnectionClosed(WebSocketSession session, CloseStatus status) {
        String deviceId = (String) session.getAttributes().get("deviceId");
        deviceIdToSession.remove(deviceId);
    }

    public boolean pushToDevice(String deviceId, ChatMessage msg) {
        WebSocketSession s = deviceIdToSession.get(deviceId);
        if (s == null || !s.isOpen()) return false;
        try {
            synchronized (s) {
                s.sendMessage(new TextMessage(mapper.writeValueAsBytes(msg)));
            }
            return true;
        } catch (Exception e) {
            return false;
        }
    }
}

Key design decisions & trade-offs

Interview follow-ups

Related