Proximity Service (Yelp-style 'Nearby') System Design Interview Question
Problem: Design a proximity / 'find businesses near me' service like Yelp, Google Maps nearby, or DoorDash restaurant search.
Overview
A proximity service answers the deceptively simple question, 'what is near me right now?'. Yelp, Google Maps nearby, DoorDash restaurant search, Uber pickup lists, and Tinder swipe stacks all share the same core primitive: given a latitude, longitude, and radius, return the top N points of interest ranked by distance and relevance in under one hundred milliseconds. The difficulty is not the math, it is the skew. Times Square, central Tokyo, and a few hundred other cells on Earth absorb almost half the global query volume, while vast oceans, deserts, and suburbs receive almost nothing. A good design treats the world as a key-value store keyed by geohash cells, caches the hot cells aggressively, and falls back to a PostGIS GiST index for the long tail. This SEO supplement expands on the widget with runnable Java sketches for geohash encoding, neighbour-cell expansion, and the REST endpoint so you can lift the ideas straight into an interview answer or a side project.
Summary
A read-heavy geospatial lookup service that must answer 'all restaurants within R meters of (lat, lng)' in under ~100ms p99. The dominant design choice is a pre-computed geohash (or quadtree) index: every business is tagged with its geohash cell, and a radius query becomes 'fetch businesses in this cell + 8 neighbor cells'. A Redis layer caches hot cells (Times Square, downtown SF) which absorb the vast majority of queries; PostGIS on Postgres is the authoritative store with GiST indexes for exact radius filtering. The main tradeoff is precision vs cacheability: coarse geohash cells (6 chars ≈ 1.2 km) cache beautifully but include far-away businesses that need post-filtering; fine cells (8 chars ≈ 38 m) need 8+ cell fetches per query. Sized for 500M DAU with ~5B nearby-search QPS/year.
Requirements
Functional
- Register, update, and delete business listings with (lat, lng, category, hours, metadata).
- Return all businesses within radius R meters of a user coordinate, ranked by distance.
- Support category filters (restaurant, pharmacy, ATM) and optional open-now filter.
- Paginate with a stable cursor so map panning does not re-render duplicates.
- Admin-only bulk import pipeline for partner data feeds (Yelp Fusion, OSM).
- Expose a /businesses/{id} detail endpoint for the map pin popover.
Non-functional
- p99 read latency under 100 ms for radius <= 5 km at 35K peak QPS.
- 85% or higher Redis cell cache hit ratio on the hot 1000 cells.
- Eventual consistency of 60 s is acceptable for listing edits; reads are not transactional.
- Regional isolation: a Tokyo region outage must not affect San Francisco traffic.
- Cost per million reads under $0.20 amortised across cache plus PostGIS.
- Horizontally scalable by adding PostGIS shards keyed on geohash prefix.
Capacity Assumptions
- 500M DAU, 2 nearby searches/user/day → 1B searches/day
- 200M registered businesses globally (restaurants, shops, services)
- Radius distribution: 500m (60%), 2km (30%), 5km+ (10%)
- Hot cell distribution: top 1000 geohash cells (cities + landmarks) handle ~40% of all queries
- Business listings update ~1M times/day (new, closed, edited hours)
Back-of-Envelope Estimates
- Nearby-search QPS: 1B / 86400 ≈ 11.5K QPS (peak 3x ≈ 35K QPS)
- Cache working set: 1000 hot cells * avg 100 KB payload ≈ 100 MB — trivially fits a single Redis node, replicated for HA
- PostGIS size: 200M rows * avg 1 KB = 200 GB + GiST index ≈ 250 GB — fits on a modest sharded Postgres
- Update QPS: 1M / 86400 ≈ 12 QPS peak — write path is not the bottleneck
- Cache hit ratio target: 85%+ on hot cells → only ~1.7K QPS reach PostGIS at steady state
High-level architecture
The hot path is a three-tier pipeline: a stateless Proximity API fleet behind an L7 load balancer, a Redis cluster keyed by geohash cell, and a sharded PostGIS cluster for authoritative data. When a /nearby request arrives the API computes an eight-character geohash of the query point (roughly 38 m cells), then trims characters until the cell width exceeds the requested radius; for a 500 m radius that lands on a six-character prefix covering about 1.2 km. The API fetches that prefix plus its eight neighbour cells from Redis in a single MGET. On a cache hit it post-filters each candidate by true haversine distance, sorts, and returns. On a miss it queries PostGIS with ST_DWithin against the same cells, writes the result back to Redis with a 60 s TTL, and returns. Writes go straight to PostGIS and emit a Kafka event that the cache invalidator consumes, evicting the affected cells within one second. A separate admin pipeline replicates partner feeds into PostGIS through a bulk COPY job every fifteen minutes. The service is deployed per region; each region holds a full PostGIS replica and its own Redis, so a cross-region outage only drops writes for its geographic slice. Observability is routed through the cache hit ratio, p99 latency per radius bucket, and PostGIS GiST index scan counts, all fanned into a Grafana board that operators watch during zoom-heavy events like Super Bowl Sunday.
Architecture Components (8)
- Client (Mobile App / Web) (client) — Mobile app or web page that asks the OS for (lat, lng) and calls /nearby with a radius.
- Load Balancer (lb) — L7 HTTPS load balancer distributing /nearby and /businesses traffic to stateless API replicas.
- Proximity API (api) — Stateless service that owns /nearby, /businesses CRUD, and orchestrates the cache→index→DB lookup chain.
- Location Service (api) — Thin helper service that encapsulates geohash / quadtree math and business-to-cell indexing.
- Geohash Index (kv) — In-memory map of `geohash_cell → [business_id,...]` held inside the Location Service; the first-pass candidate generator.
- Redis Cell Cache (cache) — Redis cluster caching precomputed nearby-results per (geohash_cell, category) combo.
- Businesses DB (Postgres + PostGIS) (sql) — Authoritative Postgres store with PostGIS extension; GiST spatial index on the geography column for exact radius filtering.
- User-Location Service (presence) — Tracks the last-reported location of each user (for 'deliver near me' style features and personalization); kept in Redis with short TTL.
Operations Walked Through (3)
- nearby-hit — User taps 'nearby restaurants' near Union Square, SF. The cell is hot in Redis; API returns in ~8ms p50.
- nearby-miss — User in a less-trafficked suburban area. The cell is cold; API falls through to PostGIS, backfills Redis, and returns. ~40ms p50.
- update-business — Owner edits the cafe's hours. API writes PostGIS, invalidates hot cells in Redis, and the change ripples through replicas within seconds.
Implementation
public final class GeoHash {
private static final String BASE32 = "0123456789bcdefghjkmnpqrstuvwxyz";
public static String encode(double lat, double lon, int precision) {
double[] latRange = {-90.0, 90.0};
double[] lonRange = {-180.0, 180.0};
StringBuilder hash = new StringBuilder(precision);
boolean evenBit = true;
int bit = 0;
int ch = 0;
while (hash.length() < precision) {
double[] range = evenBit ? lonRange : latRange;
double value = evenBit ? lon : lat;
double mid = (range[0] + range[1]) / 2.0;
if (value >= mid) {
ch = (ch << 1) | 1;
range[0] = mid;
} else {
ch = ch << 1;
range[1] = mid;
}
evenBit = !evenBit;
if (++bit == 5) {
hash.append(BASE32.charAt(ch));
bit = 0;
ch = 0;
}
}
return hash.toString();
}
}
public class NearbyService {
private final JedisCluster redis;
private final BusinessRepository repo;
public List<Business> findNearby(double lat, double lon, int radiusMeters) {
int precision = precisionForRadius(radiusMeters);
String center = GeoHash.encode(lat, lon, precision);
List<String> cells = Neighbours.around(center);
List<Business> candidates = new ArrayList<>();
for (String cell : cells) {
String cached = redis.get("cell:" + cell);
if (cached != null) {
candidates.addAll(Json.readList(cached, Business.class));
} else {
List<Business> fromDb = repo.byCellPrefix(cell);
redis.setex("cell:" + cell, 60, Json.write(fromDb));
candidates.addAll(fromDb);
}
}
return candidates.stream()
.filter(b -> haversine(lat, lon, b.lat(), b.lon()) <= radiusMeters)
.sorted(Comparator.comparingDouble(b -> haversine(lat, lon, b.lat(), b.lon())))
.limit(20)
.toList();
}
private int precisionForRadius(int r) {
if (r <= 150) return 7;
if (r <= 1200) return 6;
return 5;
}
}
@RestController
@RequestMapping("/nearby")
public class NearbyController {
private final NearbyService service;
@GetMapping
public ResponseEntity<NearbyResponse> nearby(
@RequestParam double lat,
@RequestParam double lon,
@RequestParam(defaultValue = "500") int radius,
@RequestParam(required = false) String category) {
if (radius < 1 || radius > 50_000) {
return ResponseEntity.badRequest().build();
}
List<Business> results = service.findNearby(lat, lon, radius);
if (category != null) {
results = results.stream().filter(b -> b.category().equals(category)).toList();
}
return ResponseEntity.ok(new NearbyResponse(results, System.currentTimeMillis()));
}
}
Key design decisions & trade-offs
- Spatial index structure — Chosen: Geohash cells over quadtree or R-tree. Geohashes are opaque strings that slot directly into Redis keys and Postgres B-tree indexes, whereas quadtrees need a bespoke in-memory structure and R-trees lock you into PostGIS. Geohash precision trimming also maps naturally onto the radius-to-cell-size decision.
- Cache granularity — Chosen: Cache per geohash-6 cell, not per (lat,lng) query. Per-query caching has near-zero hit ratio because user coordinates are unique. Per-cell caching lets thousands of nearby users share the same Redis entry for Times Square, pushing the hit ratio north of 85 percent.
- Consistency model — Chosen: 60 second eventual consistency for listing edits. Strict consistency would force cache-through writes and kill read throughput. Sixty seconds is well below the user's perception of freshness for a business listing and lets the invalidator batch evictions.
- Write path — Chosen: PostGIS authoritative, Kafka-driven cache invalidation. Dual-writing to Redis from the API layer risks drift on partial failure. A single writer (PostGIS) plus a Kafka consumer that evicts cells guarantees the cache never lies for longer than the invalidation lag.
- Sharding key — Chosen: Shard PostGIS by geohash prefix, not by business_id. Most queries are spatial, so colocating rows with neighbouring cells keeps each query on one shard. Hashing by business_id would scatter every radius query across every shard.
Interview follow-ups
- How would you handle a radius query that straddles the antimeridian at 180 degrees longitude?
- Walk through an S2 cell or H3 hex migration and what it buys over geohash.
- Design the open-now filter so it does not force a full-row scan in PostGIS.
- How do you keep the cache warm after a Redis failover without a thundering herd?
- Sketch a personalized ranking layer that re-scores candidates by user history.