← System Design Simulator

Metrics Monitoring and Alerting (Prometheus-style) System Design Interview Question

By Rahul Kumar · Senior Software Engineer · Updated · 9 components · 3 operations ·Source: Alex Xu, System Design Interview Vol 2, Chapter 5; Prometheus docs; Grafana docs

Problem: Design a metrics monitoring and alerting system for a fleet of ~10,000 servers and thousands of services, Prometheus/Grafana-style.

Overview

A metrics monitoring system is the nervous system of a production fleet: without it you are flying blind, and with a bad one you are flying blind while being paged every ninety seconds. The Prometheus-style design that has become the de-facto standard answers three questions — how do metrics get collected, how are they stored cheaply enough to keep for weeks, and how does an alert engine evaluate thousands of rules over them without falling behind. The big commitments are pull-based scraping (the collector discovers targets via service discovery and pulls /metrics on a fixed interval), a time-series database that leans on delta-of-delta encoding and XOR float compression to keep a single sample under two bytes on disk, and a rule engine that re-runs PromQL expressions every 15-30 seconds to fire alerts. This page walks that pipeline end-to-end and shows the Java primitives you would build if you were instrumenting a service by hand.

Metrics Monitoring and Alerting (Prometheus-style) — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive walkthrough for Metrics Monitoring and Alerting (Prometheus-style) — animated architecture diagram, step-by-step flow with real payloads, component swap, and a discrete-event stress simulator.

Summary

A pull-based metrics pipeline: instrumented apps expose /metrics, a collector scrapes them on a fixed interval, samples land in a time-series DB (TSDB) with delta-of-delta + XOR float compression, an alert engine continuously evaluates PromQL-style rules, and Grafana queries the TSDB for dashboards. The dominant design choice is pull-based scraping over push: pull makes service-discovery the source of truth for the target list, and a target failing to be scraped is itself a signal (up==0). The main tradeoff is short-lived batch jobs, which can finish before a scrape — those go through an optional push gateway. Sized for ~10M active series and ~1M samples/sec ingest, which fits Prometheus on a handful of nodes and spills to a remote long-term TSDB (Thanos / Mimir / VictoriaMetrics) for retention beyond 15 days.

Requirements

Functional

Non-functional

Capacity Assumptions

Back-of-Envelope Estimates

High-level architecture

Each service process embeds a lightweight client library that maintains in-memory counters, gauges, and histograms and exposes them on an HTTP /metrics endpoint in Prometheus text format. A collector fleet discovers targets through a service-discovery plug-in — Kubernetes API, Consul, or a static file — and, on a fixed scrape interval, issues a GET against every target's /metrics URL. Samples land in a local time-series database that is append-only: the head block holds the last two hours in memory with a write-ahead log, and older blocks are compacted into immutable on-disk chunks that apply delta-of-delta timestamp encoding and Gorilla XOR float compression, typically 1.3 bytes per sample. A rule manager reloads alerting rules from config and re-evaluates them every 15 or 30 seconds against the head block; firing alerts are pushed to an Alertmanager-style deduper that handles grouping, silencing, and routing to PagerDuty or Slack. For long-term retention, the collector remote-writes samples to a horizontally scalable store such as Thanos, Mimir, or VictoriaMetrics, which fans out queries across many collectors behind a single query API. Short-lived batch jobs that might finish before a scrape push into an optional push-gateway that the collector then scrapes like any other target. Grafana is a read-only client of the query API and performs no ingestion itself.

Architecture Components (9)

Operations Walked Through (3)

Implementation

Lock-free Gauge, Counter, and Histogram primitives
package com.example.metrics;

import java.util.concurrent.atomic.*;

public final class Metrics {

    public static final class Counter {
        private final LongAdder value = new LongAdder();   // lock-free, sharded adder
        public void inc()             { value.increment(); }
        public void add(long delta)   { value.add(delta); }
        public long get()             { return value.sum(); }
    }

    public static final class Gauge {
        private final DoubleAdder delta = new DoubleAdder();
        public void set(double v)     { delta.reset(); delta.add(v); }
        public void inc(double v)     { delta.add(v); }
        public double get()           { return delta.sum(); }
    }

    public static final class Histogram {
        private final double[] buckets;                    // upper bounds, +Inf at end
        private final LongAdder[] counts;                  // one adder per bucket
        private final DoubleAdder sum = new DoubleAdder();
        private final LongAdder total = new LongAdder();

        public Histogram(double[] upperBounds) {
            this.buckets = upperBounds;
            this.counts = new LongAdder[upperBounds.length];
            for (int i = 0; i < counts.length; i++) counts[i] = new LongAdder();
        }

        public void observe(double v) {
            total.increment();
            sum.add(v);
            for (int i = 0; i < buckets.length; i++) {
                if (v <= buckets[i]) { counts[i].increment(); return; }
            }
        }

        public long count()           { return total.sum(); }
        public double sum()           { return sum.sum(); }
        public long bucketCount(int i){ return counts[i].sum(); }
    }
}
/metrics endpoint in Prometheus exposition format
package com.example.metrics;

import com.sun.net.httpserver.*;
import java.io.*;
import java.net.InetSocketAddress;
import java.nio.charset.StandardCharsets;
import java.util.*;

public class MetricsServer {
    private final Registry registry; // holds named Counters, Gauges, Histograms

    public MetricsServer(Registry r) { this.registry = r; }

    public void start(int port) throws IOException {
        HttpServer s = HttpServer.create(new InetSocketAddress(port), 0);
        s.createContext("/metrics", this::handle);
        s.start();
    }

    private void handle(HttpExchange ex) throws IOException {
        StringBuilder out = new StringBuilder(8192);
        for (Registry.Entry e : registry.snapshot()) {
            out.append("# HELP ").append(e.name).append(' ').append(e.help).append('\n');
            out.append("# TYPE ").append(e.name).append(' ').append(e.type).append('\n');
            if (e.type.equals("histogram")) {
                Metrics.Histogram h = (Metrics.Histogram) e.instrument;
                for (int i = 0; i < e.buckets.length; i++) {
                    out.append(e.name).append("_bucket{le=\"").append(e.buckets[i]).append("\"} ")
                       .append(h.bucketCount(i)).append('\n');
                }
                out.append(e.name).append("_sum ").append(h.sum()).append('\n');
                out.append(e.name).append("_count ").append(h.count()).append('\n');
            } else if (e.type.equals("counter")) {
                out.append(e.name).append(' ').append(((Metrics.Counter) e.instrument).get()).append('\n');
            } else { // gauge
                out.append(e.name).append(' ').append(((Metrics.Gauge) e.instrument).get()).append('\n');
            }
        }
        byte[] body = out.toString().getBytes(StandardCharsets.UTF_8);
        ex.getResponseHeaders().set("Content-Type", "text/plain; version=0.0.4");
        ex.sendResponseHeaders(200, body.length);
        try (OutputStream os = ex.getResponseBody()) { os.write(body); }
    }
}
Scraper-side PullScheduler
package com.example.metrics.scrape;

import java.net.URI;
import java.net.http.*;
import java.time.Duration;
import java.util.*;
import java.util.concurrent.*;

public class PullScheduler {
    private final ScheduledExecutorService exec = Executors.newScheduledThreadPool(32);
    private final HttpClient http = HttpClient.newBuilder()
            .connectTimeout(Duration.ofSeconds(2)).build();
    private final Sink sink;                    // writes samples into the TSDB head block
    private final Duration interval;

    public PullScheduler(Sink sink, Duration interval) {
        this.sink = sink;
        this.interval = interval;
    }

    public void register(Target t) {
        // Stagger scrapes across the interval so all 10k targets don't hit at t=0.
        long jitterMs = ThreadLocalRandom.current().nextLong(interval.toMillis());
        exec.scheduleAtFixedRate(() -> scrape(t), jitterMs, interval.toMillis(), TimeUnit.MILLISECONDS);
    }

    private void scrape(Target t) {
        long ts = System.currentTimeMillis();
        try {
            HttpRequest req = HttpRequest.newBuilder(URI.create(t.url))
                    .timeout(Duration.ofSeconds(10)).GET().build();
            HttpResponse<String> r = http.send(req, HttpResponse.BodyHandlers.ofString());
            if (r.statusCode() != 200) { sink.writeUp(t, ts, 0); return; }
            sink.writeUp(t, ts, 1);
            ExpositionParser.parse(r.body(), (name, labels, value) ->
                sink.writeSample(t, name, labels, ts, value));
        } catch (Exception e) {
            sink.writeUp(t, ts, 0); // up == 0 is itself a signal the alerting engine uses
        }
    }
}

Key design decisions & trade-offs

Interview follow-ups

Related