Distributed Unique ID Generator (Snowflake) System Design Interview Question
Problem: Design a system that generates globally unique, roughly time-sortable 64-bit IDs at very high throughput across many machines, without a central bottleneck on the write path.
Overview
Every persistent write to a distributed database eventually asks the same question: what primary key should this row have? For a single-node app you just use an auto-increment column, but the moment you shard the database or run writes from multiple application replicas, central auto-increment becomes either a bottleneck or a correctness hazard. Twitter's answer in 2010 was Snowflake: a 64-bit ID that packs a millisecond timestamp, a machine identifier, and a per-millisecond sequence counter into a single long, generated entirely in-process on each application host with no network hop per ID. Discord, Instagram, and most modern microservice platforms use a close variant. The design gives you roughly time-sortable, densely packed, index-friendly primary keys at tens of millions of IDs per second cluster-wide, with p99 latency measured in microseconds. The tradeoffs - wall-clock sensitivity, finite machine-ID space, opaque to clients - are well understood, which is why Snowflake is the go-to answer in system-design interviews and real production alike.
Summary
A Twitter Snowflake-style ID generator: each application host embeds an in-process generator that packs {timestamp | datacenter_id | machine_id | sequence} into a 64-bit long. Machine IDs are allocated once at startup from ZooKeeper/etcd and then cached. The dominant design choice is in-process generation (no network hop per ID — sub-microsecond latency) with ZooKeeper only on the cold path; the main tradeoff is that wall-clock skew between hosts can cause duplicate IDs, which the generator must detect and refuse. Sized for >1M IDs/sec per host and effectively unlimited horizontal scale.
Requirements
Functional
- Generate globally unique 64-bit numeric IDs across every host in every datacenter
- IDs are roughly time-sortable (newer IDs compare greater than older ones)
- Support at least 10K IDs/sec per host, with headroom to 1M/sec
- Allocate machine IDs automatically at startup via a coordination service
- Expose IDs through both an in-process SDK and a REST endpoint for non-Java callers
- Detect and refuse to generate on clock-moved-backwards conditions rather than emit duplicates
Non-functional
- Sub-millisecond p99 latency on the generation call
- No single network hop on the hot path
- Horizontally scalable with no central write bottleneck
- Tolerate coordination-service outage after startup (machine ID is cached)
- Operate for decades on a single 41-bit timestamp without overflow
Capacity Assumptions
- Book Ch 7 requirements: IDs unique, numeric, fit in 64 bits, ordered by date, ≥ 10,000 IDs/sec
- 64-bit ID layout (book Figure 7-5): 1 sign bit | 41-bit ms timestamp | 5-bit datacenter_id | 5-bit machine_id | 12-bit sequence
- 41-bit ms timestamp → 2^41 - 1 ms ≈ 69 years since custom epoch (Twitter default: 1288834974657 = 2010-11-04)
- 2^5 = 32 datacenters × 2^5 = 32 machines per dc → 1024 machines total
- 2^12 = 4096 sequence values per ms per machine → 4.096M IDs/sec/machine ceiling
- Target: any single app host generates up to ~1M IDs/sec sustained with p99 < 1ms
Back-of-Envelope Estimates
- Aggregate throughput ceiling: 1024 * 4.096M ≈ 4.2B IDs/sec cluster-wide
- Per-host steady-state: 100K-1M IDs/sec, almost entirely CPU-bound
- ZooKeeper load: 1 read + 1 ephemeral znode write per host startup → ~10 ops/sec even with aggressive rolling deploys
- Monitoring: 1 gauge per host (last_ts, sequence_exhaustion_count, clock_skew_ms) scraped every 15s
High-level architecture
The generator runs as a library inside each application server, so the 'request path' for an ID is just a method call. On process startup the host connects to ZooKeeper (or etcd), walks /snowflake/{datacenter} looking for a free machine-ID slot, and claims it via an ephemeral znode. That znode is the host's lease - if the process dies, ZooKeeper deletes it and the slot returns to the pool. The machine ID and datacenter ID are cached in memory and the coordinator is not touched again while the process runs. For each ID request, the generator reads the current wall-clock time in milliseconds, compares it to the last-seen timestamp, and takes one of three branches: if the clock moved forward, reset the sequence counter to zero and emit; if the clock is equal to the last timestamp, atomically increment the 12-bit sequence and spin-wait on overflow until the next millisecond; if the clock moved backwards by more than a small tolerance, throw ClockMovedBackwardsException and let the caller retry on a healthy host. The final long is assembled by left-shifting each field into place: the epoch-relative timestamp into the upper 41 bits, the datacenter into the next 5, the machine into the next 5, and the sequence into the low 12. A thin REST endpoint wraps next_id() for polyglot callers, but most services embed the generator directly to save the network hop and keep the latency budget.
Architecture Components (6)
- Client (Service or User Request) (client) — Any upstream caller that needs a unique ID — a mobile app, another microservice, or a batch job.
- Application Server (api) — Stateless service that serves business traffic and embeds the Snowflake generator as a library.
- Snowflake Generator (in-process) (id-generator) — Pure-CPU generator that packs timestamp, machine ID, and a per-ms sequence into a 64-bit long.
- Coordinator (ZooKeeper / etcd) (coordinator) — Allocates unique machine IDs at host startup via ephemeral znodes.
- ID Usage DB (downstream consumer) (sql) — Representative downstream store that indexes by the generated ID.
- Monitoring (Prometheus + Alerts) (worker) — Scrapes per-host generator metrics and alerts on clock skew or sequence exhaustion.
Operations Walked Through (3)
- generate-id — Normal request-time ID generation — zero network hops beyond the app server, because the Snowflake generator runs in-process.
- machine-startup — When an app host boots, it must atomically claim an unused machine_id from the coordinator before it can generate any IDs.
- clock-skew — The wall clock moves backwards (NTP correction, VM pause, leap second). The generator MUST refuse to generate IDs until the clock catches up.
Implementation
package com.hld.idgen;
/**
* Twitter Snowflake layout:
* sign(1) | timestamp_ms(41) | datacenter(5) | machine(5) | sequence(12)
*/
public final class SnowflakeIdGenerator {
private static final long EPOCH_MS = 1288834974657L; // 2010-11-04 Twitter custom epoch
private static final int SEQUENCE_BITS = 12;
private static final int MACHINE_BITS = 5;
private static final int DATACENTER_BITS = 5;
private static final long MAX_SEQUENCE = (1L << SEQUENCE_BITS) - 1;
private static final long MAX_MACHINE = (1L << MACHINE_BITS) - 1;
private static final long MAX_DATACENTER = (1L << DATACENTER_BITS) - 1;
private static final int MACHINE_SHIFT = SEQUENCE_BITS;
private static final int DATACENTER_SHIFT = SEQUENCE_BITS + MACHINE_BITS;
private static final int TIMESTAMP_SHIFT = SEQUENCE_BITS + MACHINE_BITS + DATACENTER_BITS;
private final long datacenterId;
private final long machineId;
private long lastTimestamp = -1L;
private long sequence = 0L;
public SnowflakeIdGenerator(long datacenterId, long machineId) {
if (datacenterId < 0 || datacenterId > MAX_DATACENTER) {
throw new IllegalArgumentException("datacenterId out of range [0, " + MAX_DATACENTER + "]");
}
if (machineId < 0 || machineId > MAX_MACHINE) {
throw new IllegalArgumentException("machineId out of range [0, " + MAX_MACHINE + "]");
}
this.datacenterId = datacenterId;
this.machineId = machineId;
}
public synchronized long nextId() {
long ts = System.currentTimeMillis();
if (ts < lastTimestamp) {
long drift = lastTimestamp - ts;
throw new ClockMovedBackwardsException(drift);
}
if (ts == lastTimestamp) {
sequence = (sequence + 1) & MAX_SEQUENCE;
if (sequence == 0) {
ts = spinUntilNextMillis(lastTimestamp);
}
} else {
sequence = 0L;
}
lastTimestamp = ts;
return ((ts - EPOCH_MS) << TIMESTAMP_SHIFT)
| (datacenterId << DATACENTER_SHIFT)
| (machineId << MACHINE_SHIFT)
| sequence;
}
private long spinUntilNextMillis(long reference) {
long ts = System.currentTimeMillis();
while (ts <= reference) {
ts = System.currentTimeMillis();
}
return ts;
}
public static final class ClockMovedBackwardsException extends RuntimeException {
public ClockMovedBackwardsException(long driftMs) {
super("Clock moved backwards by " + driftMs + "ms; refusing to generate");
}
}
}
package com.hld.idgen;
import java.nio.charset.StandardCharsets;
import java.util.List;
import org.apache.curator.framework.CuratorFramework;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.KeeperException;
/** Claims a free 5-bit machine id under /snowflake/{dc}/ using an ephemeral znode. */
public final class MachineIdAllocator {
private static final int MAX_MACHINE = 31; // 2^5 - 1
private final CuratorFramework zk;
private final int datacenterId;
private final String hostName;
public MachineIdAllocator(CuratorFramework zk, int datacenterId, String hostName) {
this.zk = zk;
this.datacenterId = datacenterId;
this.hostName = hostName;
}
public int claim() throws Exception {
String dcPath = "/snowflake/" + datacenterId;
ensurePath(dcPath);
List<String> taken = zk.getChildren().forPath(dcPath);
boolean[] used = new boolean[MAX_MACHINE + 1];
for (String c : taken) {
try { used[Integer.parseInt(c)] = true; } catch (NumberFormatException ignored) { }
}
for (int candidate = 0; candidate <= MAX_MACHINE; candidate++) {
if (used[candidate]) continue;
String path = dcPath + "/" + candidate;
try {
zk.create()
.withMode(CreateMode.EPHEMERAL)
.forPath(path, hostName.getBytes(StandardCharsets.UTF_8));
return candidate;
} catch (KeeperException.NodeExistsException race) {
// another host grabbed it between getChildren and create; keep scanning
}
}
throw new IllegalStateException("No free machine id in dc " + datacenterId);
}
private void ensurePath(String path) throws Exception {
if (zk.checkExists().forPath(path) == null) {
zk.create().creatingParentsIfNeeded().forPath(path);
}
}
}
package com.hld.idgen;
import java.util.List;
import java.util.stream.LongStream;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
@RestController
@RequestMapping("/v1/ids")
public class IdGeneratorController {
private final SnowflakeIdGenerator generator;
@Autowired
public IdGeneratorController(SnowflakeIdGenerator generator) {
this.generator = generator;
}
@PostMapping
public ResponseEntity<IdResponse> nextId() {
try {
long id = generator.nextId();
// Serialize as string to avoid 53-bit precision loss in JS clients
return ResponseEntity.ok(new IdResponse(Long.toString(id), id));
} catch (SnowflakeIdGenerator.ClockMovedBackwardsException e) {
return ResponseEntity.status(503).header("Retry-After", "1").build();
}
}
@GetMapping("/batch")
public ResponseEntity<List<String>> batch(@RequestParam(defaultValue = "100") int count) {
if (count <= 0 || count > 10_000) {
return ResponseEntity.badRequest().build();
}
List<String> ids = LongStream.range(0, count)
.mapToObj(i -> Long.toString(generator.nextId()))
.toList();
return ResponseEntity.ok(ids);
}
public record IdResponse(String id, long idNumeric) { }
}
package com.hld.idgen;
import java.net.InetAddress;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.CuratorFrameworkFactory;
import org.apache.curator.retry.ExponentialBackoffRetry;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class SnowflakeBean {
@Bean(destroyMethod = "close")
public CuratorFramework curator(@Value("${zk.connect}") String connect) {
CuratorFramework cf = CuratorFrameworkFactory.newClient(connect,
new ExponentialBackoffRetry(500, 5));
cf.start();
return cf;
}
@Bean
public SnowflakeIdGenerator snowflakeGenerator(CuratorFramework zk,
@Value("${datacenter.id}") int datacenterId) throws Exception {
String host = InetAddress.getLocalHost().getHostName();
int machineId = new MachineIdAllocator(zk, datacenterId, host).claim();
return new SnowflakeIdGenerator(datacenterId, machineId);
}
}
Key design decisions & trade-offs
- Placement of the generator — Chosen: In-process library, not a dedicated service. Removes a network hop worth ~0.5ms per ID; at 1M IDs/sec that hop alone would blow the entire latency budget.
- Machine ID assignment — Chosen: ZooKeeper ephemeral znodes on startup. Centralized allocation prevents duplicate IDs across hosts; ephemeral means dead hosts auto-release the slot without operator action.
- Clock drift handling — Chosen: Refuse to generate (throw) on backwards clock movement. Emitting duplicate IDs silently corrupts downstream DB uniqueness; failing loudly lets a load balancer eject the bad host.
- Bit layout (41/5/5/12) — Chosen: Keep Twitter's default despite being tunable. Gives 69 years of timestamps, 1024 machines, and 4096 IDs/ms/machine - enough for most businesses forever; tuning it breaks any downstream code that parses bits.
- Wire format — Chosen: Serialize IDs as strings in JSON. JavaScript's Number.MAX_SAFE_INTEGER is 2^53-1, so 64-bit IDs lose precision on JSON.parse; Twitter hit this exact bug and added id_str.
Interview follow-ups
- K-sorted guarantee across datacenters for analytics range scans
- Reserved prefix bits per tenant for a multi-tenant SaaS sharding scheme
- Fallback to a separate non-monotonic generator when clock drift is detected
- Metrics on sequence-exhaustion rate to know when to shard hot hosts
- Opaque base62 encoding for public API surfaces so bit layout can evolve