Nearby Friends System Design Interview Question
Problem: Design a system like Facebook Nearby Friends / Snap Map that shows you which of your friends are nearby in near-real-time.
Overview
Nearby Friends is the real-time cousin of a proximity search. Instead of 'what restaurants are near me', it asks 'which of my friends are within a mile, right now, with their consent'. Facebook shipped this feature in 2014 and Snap Map, Apple Find My, and Life360 all run variations of the same design. The product is a continuous stream of location pings, one every few seconds per active user, fanned out only to that user's explicit list of friends, filtered by a privacy mask that can flip to invisible instantly. The system therefore marries three hard problems: a high-throughput write path (millions of pings per second), a pub/sub fanout that is cheap per subscriber, and a geo-aware filter that drops pings for friends who are too far away. This SEO supplement deepens the widget with Java sketches for the WebSocket ingestion handler, a privacy-aware fanout topic, and an in-memory quadtree that a single region server uses to answer 'who is within five kilometers' in constant time.
Summary
A location-sharing system where each client publishes its GPS coordinates every ~30 seconds over a long-lived WebSocket and receives updates for friends that have opted in. The dominant design choice is an asymmetric fan-out: writes are O(1) into Redis GEO, reads are a pub/sub fan-out driven by the friend graph, with Kafka as the durable backbone between location publishers and per-subscriber WebSocket senders. The main tradeoff is freshness vs cost — publishing every 30s is 6x cheaper than every 5s but shifts the UI toward 'approximate nearby' rather than 'live'. Sized for ~200M DAU and ~100M concurrent WebSocket connections.
Requirements
Functional
- Accept location pings from online users over a persistent WebSocket connection.
- Broadcast each ping to the user's friends who are (a) online, (b) within a configurable radius, (c) not hidden from this user.
- Let a user toggle invisible mode instantly; in-flight pings must not leak after toggle.
- Let friends subscribe to a user's live location for a bounded session (e.g. 1 hour).
- Expose a REST 'last known location' endpoint for offline friends with opt-in.
Non-functional
- End-to-end ping-to-friend latency under 2 s at p99.
- Support 100M monthly active users, 10M concurrent WebSockets per region.
- Location history retention is zero by default; only the latest ping is stored.
- Privacy toggles must propagate within 500 ms across all connected fanout nodes.
- Graceful degradation: if the fanout cluster is down, pings are dropped, not queued.
- Backpressure protects the server when a user has 5000 friends watching.
Capacity Assumptions
- 1B registered users, 200M DAU, 100M concurrent online
- Average friend list size = 200; nearby-friends opt-in = 10% of graph → ~20 subscribers per publisher
- Location publish cadence = every 30s per active client
- Location precision = ~10m (6 decimal places of lng/lat)
- 'Nearby' radius = 5 miles / 8 km
Back-of-Envelope Estimates
- Publish QPS: 100M / 30s ≈ 3.3M location updates/sec
- Fan-out QPS: 3.3M * 20 subscribers ≈ 66M deliveries/sec
- WebSocket servers: 100M / 50K per node ≈ 2000 WS nodes
- Redis GEO memory: 100M keys * ~100B ≈ 10 GB (sharded across 20 shards at 500MB each)
- Kafka throughput: 3.3M * 200B ≈ 660 MB/s sustained, 2 GB/s peak
High-level architecture
The ingestion layer is a farm of WebSocket servers, each sticky to a user via consistent hashing on user_id. When a client pings, the handler validates the coordinate, stamps the server time, writes a 'last-seen' record to Redis with a 30 s TTL, and publishes the ping to a per-user fanout topic on a Redis Pub/Sub or Kafka cluster. Every user who subscribes to my live location is registered on the fanout topic keyed by my user_id. A small co-resident service, the Fanout Router, reads the topic, pulls my friend-graph slice for just those subscribers, applies the privacy mask (block list, invisible flag, distance limit), and hands each delivery to the WebSocket server that owns that friend's session. Each region also maintains an in-memory quadtree of currently-online user points; this index backs the 'nearby stranger' features and lets the router shortcut the distance check in micro-seconds. A Presence service feeds session up or down events to the router so vanished users get pruned. The authoritative friend graph and privacy settings live in a sharded relational database and are hot-cached per connection at WebSocket upgrade time. The whole pipeline is regional; cross-region friends flow through a mesh gateway that respects per-region privacy overrides. Back-pressure is enforced by dropping pings, never queueing, so a misbehaving client can never pile work on the server.
Architecture Components (9)
- Client (iOS / Android) (client) — Mobile app holding a persistent WebSocket, publishing its location every 30s and rendering friends' positions on a map.
- Connection Load Balancer (lb) — L4 load balancer holding millions of long-lived WebSocket flows; consistent-hashes on user_id.
- WebSocket Server (api) — Stateful WebSocket server maintaining ~50K concurrent connections per node; terminal for both publish and receive frames.
- Location Publisher (worker) — Co-located with the WebSocket server; writes each location to Redis GEO and emits to Kafka for fan-out.
- Redis GEO (cache) — Sharded Redis cluster using GEO commands (sorted sets with geohash scores) to store every active user's current location.
- Location Stream (Kafka) (queue) — Partitioned log of all location updates; decouples publishers from subscription fan-out.
- Subscription Manager (stream-processor) — Consumes location stream, looks up each publisher's subscribers, and fans out to the WS server hosting each subscriber.
- Friends Service (nosql) — Stores the friend graph and the 'nearby sharing' opt-in flag per edge.
- Notification Service (worker) — Sends APNs/FCM pushes when a friend comes within N meters for the first time today ('Alice is now nearby').
Operations Walked Through (3)
- publish-location — Client sends a LOC frame; server writes Redis GEO and emits to Kafka for fan-out. No response needed beyond an ACK.
- query-nearby — User opens the Nearby Friends tab; client requests the current positions of friends within 5 miles.
- friend-online — When a user opens the app, a PRESENCE frame fans out to subscribers so their UIs can un-grey the friend's pin.
Implementation
@Component
public class LocationSocketHandler extends TextWebSocketHandler {
private final PrivacyService privacy;
private final FanoutPublisher fanout;
private final Quadtree regionIndex;
@Override
public void handleTextMessage(WebSocketSession session, TextMessage msg) throws IOException {
Ping ping = Json.read(msg.getPayload(), Ping.class);
long userId = (long) session.getAttributes().get("userId");
if (privacy.isInvisible(userId)) {
return; // silently drop
}
if (Math.abs(ping.lat()) > 90 || Math.abs(ping.lon()) > 180) {
session.sendMessage(new TextMessage("{\"err\":\"bad_coord\"}"));
return;
}
long ts = System.currentTimeMillis();
regionIndex.upsert(userId, ping.lat(), ping.lon(), ts);
fanout.publish(userId, new LocationEvent(userId, ping.lat(), ping.lon(), ts));
}
}
public class FanoutPublisher {
private final JedisCluster redis;
public void publish(long userId, LocationEvent ev) {
redis.publish("loc:" + userId, Json.write(ev));
}
}
public class FanoutRouter implements MessageListener {
private final FriendGraph graph;
private final PrivacyService privacy;
private final SessionRegistry sessions;
@Override
public void onMessage(Message m, byte[] pattern) {
LocationEvent ev = Json.read(m.getBody(), LocationEvent.class);
for (long friendId : graph.friendsOf(ev.userId())) {
if (!privacy.visibleTo(ev.userId(), friendId)) continue;
WebSocketSession s = sessions.find(friendId);
if (s == null || !s.isOpen()) continue;
try {
s.sendMessage(new TextMessage(Json.write(ev)));
} catch (IOException e) {
sessions.drop(friendId);
}
}
}
}
public class Quadtree {
private final Node root;
public Quadtree(BoundingBox regionBounds) {
this.root = new Node(regionBounds, 0);
}
public void upsert(long userId, double lat, double lon, long ts) {
root.insert(new Point(userId, lat, lon, ts));
}
public List<Point> within(double lat, double lon, double radiusM) {
BoundingBox box = BoundingBox.around(lat, lon, radiusM);
List<Point> out = new ArrayList<>();
root.query(box, out);
return out;
}
static class Node {
static final int CAPACITY = 16;
final BoundingBox bbox;
final int depth;
List<Point> points = new ArrayList<>();
Node[] children;
Node(BoundingBox b, int d) { this.bbox = b; this.depth = d; }
void insert(Point p) {
if (!bbox.contains(p)) return;
if (children == null && points.size() < CAPACITY) { points.add(p); return; }
if (children == null) subdivide();
for (Node c : children) c.insert(p);
}
void query(BoundingBox b, List<Point> out) {
if (!bbox.intersects(b)) return;
for (Point p : points) if (b.contains(p)) out.add(p);
if (children != null) for (Node c : children) c.query(b, out);
}
private void subdivide() {
children = bbox.split(depth + 1);
List<Point> old = points;
points = new ArrayList<>();
for (Point p : old) for (Node c : children) c.insert(p);
}
}
}
Key design decisions & trade-offs
- Transport — Chosen: WebSocket over long-poll or gRPC streaming. WebSocket is supported natively by every mobile OS and keeps a single connection per user so fanout is a map lookup. gRPC streaming works but adds a TLS plus HTTP2 dependency the mobile client would rather avoid.
- Fanout broker — Chosen: Redis Pub/Sub for hot path, Kafka for audit and replay. Redis Pub/Sub is sub-millisecond but drops messages with no subscriber, which matches the 'zero retention' privacy requirement. Kafka behind it gives us a trail for abuse investigations without putting Kafka on the hot path.
- Privacy enforcement point — Chosen: Check privacy at the router, not the ingestion server. Ingestion has no friend graph context and checking there would force every WebSocket server to cache the entire social graph. The router already fetches the subscriber list, so adding the privacy mask is one extra boolean per delivery.
- Location store — Chosen: Ephemeral in-memory quadtree plus Redis last-seen, no historical DB. Storing every ping would create a surveillance dataset we do not want and explode storage cost. Latest-only in memory is cheaper and matches the product promise that 'we do not track your history'.
- Backpressure policy — Chosen: Drop pings under load instead of queueing. A queued ping that arrives 10 s late is misleading and worse than no ping. Dropping preserves the invariant that every delivered location is recent.
Interview follow-ups
- How would you handle a user whose WebSocket flaps every few seconds on a flaky cellular link?
- Design the cross-region handoff when a user flies from SFO to NRT mid-session.
- How do you prevent a scraper from impersonating friends to harvest location data?
- Extend the system to support location-triggered notifications ('ping me when Alice is 1 km away').
- What changes if the product allows 10-minute location history breadcrumbs?