How do you design Video Streaming (YouTube-style) for a system design interview?

Start from requirements, estimate scale, draw the high-level architecture, explain the main data flow, identify bottlenecks, and compare trade-offs. This walkthrough covers Video Streaming (YouTube-style) with components, operations, capacity estimates, and an interactive simulator.

What should I discuss in a Video Streaming (YouTube-style) system design answer?

Cover functional requirements, non-functional requirements, APIs, data model, key services, storage choices, caching, queues or streams where relevant, failure handling, observability, and scaling trade-offs.

Can I practice Video Streaming (YouTube-style) interactively?

Yes. The page includes an interactive browser-based simulator for Video Streaming (YouTube-style), including an architecture diagram, step-by-step request flow, component swap, and stress simulation.

Video Streaming (YouTube-style) System Design Interview Question

By Rahul Kumar · Senior Software Engineer · Updated May 2026 · 12 components · 5 operations ·Source: Alex Xu, System Design Interview Vol 1, Chapter 14

Problem: Design a video upload, transcoding, and streaming platform like YouTube.

Overview

A YouTube-style video platform is two workloads glued together: a trickle of uploads (around 1% of traffic) that must survive multi-gigabyte files on flaky home Wi-Fi, and a firehose of playback (the remaining 99%) that must hit single-digit startup latency worldwide. The interview answer is to decouple them aggressively. Uploads land in raw object storage through resumable, chunked PUTs close to the user, then a DAG of transcoding workers fans the source into an adaptive-bitrate ladder (240p through 4K), HLS segments, and thumbnails. Playback never touches the application tier on the hot path; master playlists and six-second segments are served from a CDN that absorbs 95%+ of global QPS. This intro frames the design's two-sided nature before the architecture walkthrough dives into how the completion queue, metadata service, and recommendation engine stitch the two flows together.

Video Streaming (YouTube-style) — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive walkthrough for Video Streaming (YouTube-style) — animated architecture diagram, step-by-step flow with real payloads, component swap, and a discrete-event stress simulator.

Summary

A massively read-skewed system (~99% reads, 1% uploads) split into two flows: a video-uploading flow (original storage → transcoding servers → transcoded storage → CDN, with a completion queue + handler that updates metadata once encoding finishes) and a streaming flow that serves adaptive-bitrate manifests and segments from the CDN with fallback to the transcoded origin. The dominant design choice is push all playback traffic to CDN edge — the origin should see <1% of global playback QPS — while transcoding runs asynchronously on a DAG of tasks (inspection → video encoding → audio encoding → thumbnails → watermark → assembler) so uploads never block on the 10+ minute encode. The main tradeoffs are storage blow-up (each source becomes 6–8 adaptive-bitrate ladders, ~3–5x raw storage) and upload latency, which the book attacks with GOP-level chunk parallelism and upload points geographically near users.

Requirements

Functional

Upload source videos up to multi-GB with resumable, chunked PUTs
Transcode each source into an ABR ladder (240p, 360p, 480p, 720p, 1080p, 4K) plus HLS/DASH segments and thumbnails
Serve adaptive-bitrate playback with sub-second startup and mid-stream ladder switching
Metadata lookup by video ID: title, owner, privacy, length, ladder manifest URL
Trending, recommended, and search surfaces powered by view logs
Content moderation, takedown, and geo-restriction per video
Live comment / like / watch-count counters with eventual consistency

Non-functional

99.95% playback availability; 99.9% upload availability
P99 segment fetch under 100 ms from CDN edge globally
Durability 11 nines for source and transcoded assets
Scale to 2B DAU, ~115K segment QPS sustained, ~350K peak
Transcoding pipeline elastic to 100+ concurrent encodes
Cost efficiency: origin egress under 5% of total playback bandwidth

Capacity Assumptions

2B DAU, 5 videos watched per user per day → 10B views/day
500K uploads/day, average video 100 MB source
Transcoded to 6 adaptive-bitrate ladders (240p, 360p, 480p, 720p, 1080p, 4K) + HLS 6s segments
Video retained forever (no expiry); metadata updates on transcode completion via completion queue
CDN hit ratio target: 95%+ for the long-tail head; origin absorbs <5% of playback QPS

Back-of-Envelope Estimates

Playback QPS: 10B / 86400 ≈ 115K segment requests/sec (peak ~350K)
Origin egress: 5% of peak = 17K segments/sec — CDN absorbs the rest
Upload bandwidth: 500K * 100MB / 86400 ≈ 580 MB/s peak
Storage: 500K * 100MB * 4x (ladders + HLS overhead) / day ≈ 200 TB/day, 73 PB/year
Transcoding: 500K uploads * avg 10 min compute / 86400 ≈ 35 concurrent encodes baseline, peak 100+

High-level architecture

The upload path begins at a regional upload PoP so the first TCP hop is short. The client initiates a resumable upload, receives a signed URL, and PUTs 5 MB GOP-aligned chunks in parallel. Chunks stream into the original-source bucket; on the final chunk the upload service enqueues a TranscodingJob. Workers pull the DAG (inspect, video encode, audio encode, thumbnails, watermark, assembler), emit ladder outputs to the transcoded bucket, and write an HLS master playlist. A completion queue notifies the metadata service and the CDN pre-warm job so the first viewer does not pay a cold-cache tax. The playback path is CDN-first: the client resolves to an edge PoP, fetches the master.m3u8, picks a ladder from bandwidth probes, then pulls six-second .ts or .m4s segments that are cached aggressively. Origin pull only happens on cache miss. Metadata (title, ACL, manifest URL) is served by a sharded SQL tier behind a Redis cache with tight TTLs. A recommendation service and a view-count aggregator consume a Kafka stream of playback events. Tradeoffs dominate: storage blows up 3-5x from the ABR ladder, and transcoding latency is variable, which is why uploads return immediately and the UI shows a 'processing' state. The system assumes read-skew and pushes nearly all bytes to the edge, keeping the application tier small, stateless, and cheap.

Architecture Components (12)

Client (Web / Mobile / TV) (client) — HLS/DASH player that adapts bitrate based on network and buffer; also chunks uploads into GOP-aligned pieces for parallel transfer.
CDN Edge (cdn) — Geographically distributed edge caches serving HLS/DASH segments close to the viewer; falls back to transcoded-storage origin on miss.
Load Balancer (lb) — L7 LB for control-plane traffic (API calls, upload initiation), NOT for segment GETs.
Video API (api) — Metadata CRUD, upload session coordination, signed-URL issuance to original storage, and playback manifest URL lookup.
Metadata DB (sql) — Relational store for video metadata (title, owner, status, view count, ladder list, manifest URL).
Metadata Cache (cache) — Redis cache fronting the metadata DB for watch-page reads and manifest-URL lookups.
Original Storage (blob) — S3-style bucket holding raw source uploads; the book's first stop for uploaded bytes before transcoding.
Transcoding Servers (worker) — Fleet that pulls raw source from original storage and runs a DAG of transcoding tasks to produce every ladder, thumbnail, and captions set.
Transcoded Storage (blob) — Authoritative home of all transcoded HLS/DASH segments and manifests; acts as the CDN's origin.
Completion Queue (queue) — Durable queue of transcode-completion events produced by the transcoding DAG's assembler and drained by the completion handler.
Completion Handler (worker) — Consumer of the completion queue that finalizes the upload: flips metadata to READY, stores the manifest/ladder list, and warms the cache.
Recommendation Service (api) — Returns 'up next' list for watch pages.

Operations Walked Through (5)

play — Client pulls manifest + segments from CDN; origin never touched for popular content. This is the streaming flow's happy path.
play-cold — First viewer in the region (long-tail or newly-promoted video): the edge misses, shield pulls from transcoded storage, segment is written into edge cache for subsequent viewers.
watch-page — Client fetches video metadata + recommendations. Metadata is Redis-cached after the completion handler warms it on READY transition; rec fans out in parallel.
upload — Per the book: client POSTs metadata AND streams source chunks in parallel. API issues a signed URL, client PUTs chunks direct to original storage, origin-bucket event triggers transcoding pipeline asynchronously, completion handler eventually flips metadata to READY.
transcode — Transcoding workers pull source from original storage, run the DAG (inspection → per-ladder video/audio encode + thumbnails → assembler → HLS manifests), write outputs to transcoded storage, publish to CDN, emit a completion event. Completion handler flips metadata to READY and warms cache.

Implementation

Upload endpoint returning a resumable upload URL

@RestController
@RequestMapping("/v1/videos")
public class VideoUploadController {
  private final UploadSessionService sessions;
  private final SignedUrlFactory signer;

  public VideoUploadController(UploadSessionService s, SignedUrlFactory f) {
    this.sessions = s;
    this.signer = f;
  }

  @PostMapping("/uploads")
  public ResponseEntity<InitiateUploadResponse> initiate(
      @RequestBody InitiateUploadRequest req,
      @AuthenticationPrincipal UserPrincipal user) {
    if (req.getSizeBytes() <= 0 || req.getSizeBytes() > 10L * 1024 * 1024 * 1024) {
      return ResponseEntity.badRequest().build();
    }
    UploadSession session = sessions.create(user.getId(), req.getFilename(), req.getSizeBytes(), req.getContentType());
    URI resumableUrl = signer.signedPut(session.getObjectKey(), Duration.ofHours(6));
    InitiateUploadResponse body = new InitiateUploadResponse(
        session.getSessionId(),
        resumableUrl.toString(),
        5 * 1024 * 1024,
        session.getExpiresAt());
    return ResponseEntity
        .status(HttpStatus.CREATED)
        .header("Location", "/v1/videos/uploads/" + session.getSessionId())
        .body(body);
  }
}

TranscodingJob model

public class TranscodingJob {
  public enum State { QUEUED, RUNNING, COMPLETED, FAILED }
  public enum Ladder { P240, P360, P480, P720, P1080, P2160 }

  private final String jobId;
  private final String videoId;
  private final String sourceKey;
  private final Set<Ladder> targets;
  private State state;
  private int attempt;
  private Instant enqueuedAt;
  private Instant startedAt;
  private Instant completedAt;
  private String failureReason;

  public TranscodingJob(String videoId, String sourceKey, Set<Ladder> targets) {
    this.jobId = UUID.randomUUID().toString();
    this.videoId = videoId;
    this.sourceKey = sourceKey;
    this.targets = EnumSet.copyOf(targets);
    this.state = State.QUEUED;
    this.attempt = 0;
    this.enqueuedAt = Instant.now();
  }

  public void markRunning() {
    this.state = State.RUNNING;
    this.startedAt = Instant.now();
    this.attempt++;
  }

  public void markCompleted() {
    this.state = State.COMPLETED;
    this.completedAt = Instant.now();
  }

  public void markFailed(String reason) {
    this.state = State.FAILED;
    this.failureReason = reason;
    this.completedAt = Instant.now();
  }

  public String getJobId() { return jobId; }
  public String getVideoId() { return videoId; }
  public State getState() { return state; }
  public Set<Ladder> getTargets() { return Collections.unmodifiableSet(targets); }
}

Chunked multi-part upload handler

@RestController
@RequestMapping("/v1/videos/uploads/{sessionId}")
public class ChunkUploadController {
  private final UploadSessionService sessions;
  private final ObjectStoreClient store;
  private final TranscodingQueue queue;

  @PutMapping(value = "/parts/{partNumber}", consumes = MediaType.APPLICATION_OCTET_STREAM_VALUE)
  public ResponseEntity<PartResponse> uploadPart(
      @PathVariable String sessionId,
      @PathVariable int partNumber,
      @RequestHeader("Content-Range") String contentRange,
      @RequestHeader("X-Chunk-Sha256") String chunkHash,
      InputStream body) throws IOException {
    UploadSession session = sessions.require(sessionId);
    ByteRange range = ByteRange.parse(contentRange);
    String etag = store.putPart(session.getUploadId(), partNumber, body, range.length(), chunkHash);
    sessions.recordPart(sessionId, partNumber, etag, range);
    if (sessions.isComplete(sessionId)) {
      List<PartRef> parts = sessions.listParts(sessionId);
      store.completeMultipart(session.getUploadId(), parts);
      sessions.markUploaded(sessionId);
      queue.enqueue(new TranscodingJob(session.getVideoId(), session.getObjectKey(), Ladder.defaults()));
      return ResponseEntity.ok(PartResponse.finalPart(etag));
    }
    return ResponseEntity.ok(PartResponse.intermediate(etag));
  }
}

HLS master manifest generation

public final class HlsManifestBuilder {
  public String buildMaster(List<RenditionOutput> renditions) {
    StringBuilder sb = new StringBuilder();
    sb.append("#EXTM3U\n");
    sb.append("#EXT-X-VERSION:7\n");
    sb.append("#EXT-X-INDEPENDENT-SEGMENTS\n");
    for (RenditionOutput r : renditions) {
      sb.append("#EXT-X-STREAM-INF:BANDWIDTH=").append(r.getBandwidthBps())
        .append(",AVERAGE-BANDWIDTH=").append(r.getAvgBandwidthBps())
        .append(",RESOLUTION=").append(r.getWidth()).append('x').append(r.getHeight())
        .append(",CODECS=\"").append(r.getCodecs()).append("\"")
        .append(",FRAME-RATE=").append(r.getFps())
        .append('\n');
      sb.append(r.getPlaylistPath()).append('\n');
    }
    return sb.toString();
  }

  public String buildMedia(List<HlsSegment> segments, int targetDurationSec) {
    StringBuilder sb = new StringBuilder();
    sb.append("#EXTM3U\n")
      .append("#EXT-X-VERSION:7\n")
      .append("#EXT-X-TARGETDURATION:").append(targetDurationSec).append('\n')
      .append("#EXT-X-MEDIA-SEQUENCE:0\n")
      .append("#EXT-X-PLAYLIST-TYPE:VOD\n");
    for (HlsSegment s : segments) {
      sb.append("#EXTINF:").append(String.format("%.3f", s.getDurationSec())).append(",\n");
      sb.append(s.getUri()).append('\n');
    }
    sb.append("#EXT-X-ENDLIST\n");
    return sb.toString();
  }
}

Key design decisions & trade-offs

Where to do playback delivery — Chosen: CDN-first with origin pull fallback. Pushing 95%+ of segment bytes to edge PoPs is the only way to serve ~115K QPS globally under 100 ms P99 without a multi-Tbps origin. Origin cost and failure blast radius both shrink dramatically.
Transcoding timing — Chosen: Async DAG of workers with a completion queue. A 10-minute encode cannot block the upload response. Async keeps the 'processing' UX simple and lets the encoder fleet scale independently; cost is a delay before the video is watchable.
Storage layout for transcoded outputs — Chosen: Full ABR ladder (6-8 renditions) per source. Trades ~4x raw storage for adaptive playback across 2G phones to 1 Gbps fiber. Cheaper than re-encoding on demand and keeps the edge cache-friendly.
Upload protocol — Chosen: Resumable multipart with 5 MB GOP-aligned chunks. Avoids restarting multi-GB uploads on a single flaky connection and allows parallel PUTs to saturate the user's uplink. Adds server bookkeeping for part manifests.
Metadata store — Chosen: Sharded SQL behind Redis. Video metadata is relational (owner, ACL, playlists) and read-dominated. SQL gives transactional updates on privacy changes; Redis absorbs the 10-100x read amplification.

Interview follow-ups

How would you support live streaming with LL-HLS and sub-3-second glass-to-glass latency?
How would you A/B test a new codec (AV1) without doubling storage for all videos?
How do you protect premium content with DRM (Widevine/FairPlay) across the ladder?
How would you build a recommendation service on top of view-event logs?
How do you handle copyright (Content ID) matching at upload time?