Back-of-envelope Estimator
DAU → QPS / storage / bandwidth with the math shown inline.
This interactive explanation is built for system design interview prep: step through Back-of-envelope Estimator, watch the internal state change, and connect the concept to real distributed-system trade-offs.
Overview
Back-of-envelope (BOE) estimation is the skill that separates a system design interview answer from a plausible system. Before you draw a single box, you need to know roughly how many requests per second will hit it, how much storage it will need a year from now, and how much network bandwidth it will push. BOE lets you catch the design-killing numbers early: a photo upload service at 1 TB per day is a very different system from 100 GB per day, and you only find out which one you are designing by multiplying a few numbers. The math is deliberately rough. You convert daily-active users into QPS by dividing by the seconds in a day (about 86,400, usually rounded to 100k for speed), you multiply record size by records per day for daily ingest, and you multiply daily ingest by retention days for total storage. The goal is an order-of-magnitude answer, not a forecast.
How it works
Every BOE calculation follows the same three-step pattern. First, translate user activity into a rate. If you have 100 million DAU and each user performs 10 writes per day, that is 1 billion writes per day, which divided by roughly 100k seconds per day gives 10,000 writes per second average and typically 2-3x peak, so assume 30k QPS peak. Second, translate that rate into a storage footprint. Multiply request size by requests per day, then multiply by retention. A 1 KB tweet, 500 million per day, kept for 5 years is 500 GB per day and about 900 TB over the retention window. Third, translate rate and size into bandwidth. 30k QPS of 1 KB responses is 30 MB/s, which is a single machine's NIC; 30k QPS of 1 MB responses is 30 GB/s, which is a fleet. The useful constants to memorize are: seconds per day is about 100k, a year is about 30 million seconds, L1 cache is about 1 ns, main memory about 100 ns, SSD read about 100 us, cross-datacenter round-trip about 100 ms. Keep these in your head and you can sanity-check any design in under a minute.
Implementation
public final class BackOfEnvelope {
private static final long SECONDS_PER_DAY = 86_400L;
private static final double PEAK_MULTIPLIER = 3.0; // peak vs average
private BackOfEnvelope() {}
/** Average QPS from DAU and actions-per-user-per-day. */
public static double qps(long dau, long actionsPerDay) {
if (dau <= 0 || actionsPerDay <= 0) return 0.0;
return ((double) dau * actionsPerDay) / SECONDS_PER_DAY;
}
/** Peak QPS assuming a 3x peak-to-average ratio. */
public static double peakQps(long dau, long actionsPerDay) {
return qps(dau, actionsPerDay) * PEAK_MULTIPLIER;
}
/** Total storage in bytes for the retention window. */
public static long storage(long bytesPerItem, long itemsPerDay, long retentionDays) {
return bytesPerItem * itemsPerDay * retentionDays;
}
/** Bytes per second needed at average QPS. */
public static double bandwidth(double qps, long bytesPerReq) {
return qps * bytesPerReq;
}
/** Human-friendly string: auto-scale to KB/MB/GB/TB. */
public static String humanBytes(double bytes) {
String[] units = {"B", "KB", "MB", "GB", "TB", "PB"};
int i = 0;
while (bytes >= 1024 && i < units.length - 1) { bytes /= 1024; i++; }
return String.format("%.1f %s", bytes, units[i]);
}
public static void main(String[] args) {
long dau = 100_000_000L, writesPerDay = 10;
double q = qps(dau, writesPerDay);
long total = storage(1024, dau * writesPerDay, 5 * 365);
System.out.println("avg qps=" + q + " peak=" + peakQps(dau, writesPerDay));
System.out.println("5y storage=" + humanBytes(total));
}
}
Complexity
- seconds/day:
~100,000 - year in seconds:
~30M - main memory read:
~100 ns - SSD read:
~100 us - cross-DC RTT:
~100 ms
Key design decisions & trade-offs
- Rounding seconds/day — Chosen: Use 100k instead of 86,400. Trades 15% accuracy for mental math speed; BOE is an order-of-magnitude tool, not a forecast.
- Peak-to-average multiplier — Chosen: Assume 2-3x peak over average. Most consumer traffic follows a diurnal pattern that peaks at roughly this ratio; safer to over-provision than under-estimate.
- Bandwidth math — Chosen: Compute both ingress and egress separately. Read-heavy systems have 10-100x more egress than ingress; a single number hides the dominant cost.
Common pitfalls
- Forgetting to multiply by replication factor for storage (3x for quorum systems)
- Confusing bits per second with bytes per second on a NIC
- Using DAU when the correct metric is concurrent users for connection counts
- Ignoring read amplification: one logical read can be 3-10 physical reads in an LSM
Interview follow-ups
- Model hot-key skew: a 1% skew can turn 10k QPS into 1k QPS per hot shard
- Estimate cache hit rate impact on downstream QPS
- Size working set vs total dataset and pick memory vs SSD tier accordingly
- Account for cross-region replication bandwidth separately from client-facing bandwidth
Recommended reading
- Alex Petrov, Database Internals — storage engines and distributed systems internals.
- Martin Kleppmann, Designing Data-Intensive Applications (DDIA) — data models, replication, partitioning, consistency.
- The System Design Primer — high-level design building blocks.
- Foundational networking + web-security references (TCP/IP, TLS 1.3, OWASP Top 10).