← System Design Simulator

Video Streaming (YouTube-style) System Design Interview Question

By Rahul Kumar · Senior Software Engineer · Updated · 12 components · 5 operations ·Source: Alex Xu, System Design Interview Vol 1, Chapter 14

Problem: Design a video upload, transcoding, and streaming platform like YouTube.

Overview

A YouTube-style video platform is two workloads glued together: a trickle of uploads (around 1% of traffic) that must survive multi-gigabyte files on flaky home Wi-Fi, and a firehose of playback (the remaining 99%) that must hit single-digit startup latency worldwide. The interview answer is to decouple them aggressively. Uploads land in raw object storage through resumable, chunked PUTs close to the user, then a DAG of transcoding workers fans the source into an adaptive-bitrate ladder (240p through 4K), HLS segments, and thumbnails. Playback never touches the application tier on the hot path; master playlists and six-second segments are served from a CDN that absorbs 95%+ of global QPS. This intro frames the design's two-sided nature before the architecture walkthrough dives into how the completion queue, metadata service, and recommendation engine stitch the two flows together.

Video Streaming (YouTube-style) — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive walkthrough for Video Streaming (YouTube-style) — animated architecture diagram, step-by-step flow with real payloads, component swap, and a discrete-event stress simulator.

Summary

A massively read-skewed system (~99% reads, 1% uploads) split into two flows: a video-uploading flow (original storage → transcoding servers → transcoded storage → CDN, with a completion queue + handler that updates metadata once encoding finishes) and a streaming flow that serves adaptive-bitrate manifests and segments from the CDN with fallback to the transcoded origin. The dominant design choice is push all playback traffic to CDN edge — the origin should see <1% of global playback QPS — while transcoding runs asynchronously on a DAG of tasks (inspection → video encoding → audio encoding → thumbnails → watermark → assembler) so uploads never block on the 10+ minute encode. The main tradeoffs are storage blow-up (each source becomes 6–8 adaptive-bitrate ladders, ~3–5x raw storage) and upload latency, which the book attacks with GOP-level chunk parallelism and upload points geographically near users.

Requirements

Functional

Non-functional

Capacity Assumptions

Back-of-Envelope Estimates

High-level architecture

The upload path begins at a regional upload PoP so the first TCP hop is short. The client initiates a resumable upload, receives a signed URL, and PUTs 5 MB GOP-aligned chunks in parallel. Chunks stream into the original-source bucket; on the final chunk the upload service enqueues a TranscodingJob. Workers pull the DAG (inspect, video encode, audio encode, thumbnails, watermark, assembler), emit ladder outputs to the transcoded bucket, and write an HLS master playlist. A completion queue notifies the metadata service and the CDN pre-warm job so the first viewer does not pay a cold-cache tax. The playback path is CDN-first: the client resolves to an edge PoP, fetches the master.m3u8, picks a ladder from bandwidth probes, then pulls six-second .ts or .m4s segments that are cached aggressively. Origin pull only happens on cache miss. Metadata (title, ACL, manifest URL) is served by a sharded SQL tier behind a Redis cache with tight TTLs. A recommendation service and a view-count aggregator consume a Kafka stream of playback events. Tradeoffs dominate: storage blows up 3-5x from the ABR ladder, and transcoding latency is variable, which is why uploads return immediately and the UI shows a 'processing' state. The system assumes read-skew and pushes nearly all bytes to the edge, keeping the application tier small, stateless, and cheap.

Architecture Components (12)

Operations Walked Through (5)

Implementation

Upload endpoint returning a resumable upload URL
@RestController
@RequestMapping("/v1/videos")
public class VideoUploadController {
  private final UploadSessionService sessions;
  private final SignedUrlFactory signer;

  public VideoUploadController(UploadSessionService s, SignedUrlFactory f) {
    this.sessions = s;
    this.signer = f;
  }

  @PostMapping("/uploads")
  public ResponseEntity<InitiateUploadResponse> initiate(
      @RequestBody InitiateUploadRequest req,
      @AuthenticationPrincipal UserPrincipal user) {
    if (req.getSizeBytes() <= 0 || req.getSizeBytes() > 10L * 1024 * 1024 * 1024) {
      return ResponseEntity.badRequest().build();
    }
    UploadSession session = sessions.create(user.getId(), req.getFilename(), req.getSizeBytes(), req.getContentType());
    URI resumableUrl = signer.signedPut(session.getObjectKey(), Duration.ofHours(6));
    InitiateUploadResponse body = new InitiateUploadResponse(
        session.getSessionId(),
        resumableUrl.toString(),
        5 * 1024 * 1024,
        session.getExpiresAt());
    return ResponseEntity
        .status(HttpStatus.CREATED)
        .header("Location", "/v1/videos/uploads/" + session.getSessionId())
        .body(body);
  }
}
TranscodingJob model
public class TranscodingJob {
  public enum State { QUEUED, RUNNING, COMPLETED, FAILED }
  public enum Ladder { P240, P360, P480, P720, P1080, P2160 }

  private final String jobId;
  private final String videoId;
  private final String sourceKey;
  private final Set<Ladder> targets;
  private State state;
  private int attempt;
  private Instant enqueuedAt;
  private Instant startedAt;
  private Instant completedAt;
  private String failureReason;

  public TranscodingJob(String videoId, String sourceKey, Set<Ladder> targets) {
    this.jobId = UUID.randomUUID().toString();
    this.videoId = videoId;
    this.sourceKey = sourceKey;
    this.targets = EnumSet.copyOf(targets);
    this.state = State.QUEUED;
    this.attempt = 0;
    this.enqueuedAt = Instant.now();
  }

  public void markRunning() {
    this.state = State.RUNNING;
    this.startedAt = Instant.now();
    this.attempt++;
  }

  public void markCompleted() {
    this.state = State.COMPLETED;
    this.completedAt = Instant.now();
  }

  public void markFailed(String reason) {
    this.state = State.FAILED;
    this.failureReason = reason;
    this.completedAt = Instant.now();
  }

  public String getJobId() { return jobId; }
  public String getVideoId() { return videoId; }
  public State getState() { return state; }
  public Set<Ladder> getTargets() { return Collections.unmodifiableSet(targets); }
}
Chunked multi-part upload handler
@RestController
@RequestMapping("/v1/videos/uploads/{sessionId}")
public class ChunkUploadController {
  private final UploadSessionService sessions;
  private final ObjectStoreClient store;
  private final TranscodingQueue queue;

  @PutMapping(value = "/parts/{partNumber}", consumes = MediaType.APPLICATION_OCTET_STREAM_VALUE)
  public ResponseEntity<PartResponse> uploadPart(
      @PathVariable String sessionId,
      @PathVariable int partNumber,
      @RequestHeader("Content-Range") String contentRange,
      @RequestHeader("X-Chunk-Sha256") String chunkHash,
      InputStream body) throws IOException {
    UploadSession session = sessions.require(sessionId);
    ByteRange range = ByteRange.parse(contentRange);
    String etag = store.putPart(session.getUploadId(), partNumber, body, range.length(), chunkHash);
    sessions.recordPart(sessionId, partNumber, etag, range);
    if (sessions.isComplete(sessionId)) {
      List<PartRef> parts = sessions.listParts(sessionId);
      store.completeMultipart(session.getUploadId(), parts);
      sessions.markUploaded(sessionId);
      queue.enqueue(new TranscodingJob(session.getVideoId(), session.getObjectKey(), Ladder.defaults()));
      return ResponseEntity.ok(PartResponse.finalPart(etag));
    }
    return ResponseEntity.ok(PartResponse.intermediate(etag));
  }
}
HLS master manifest generation
public final class HlsManifestBuilder {
  public String buildMaster(List<RenditionOutput> renditions) {
    StringBuilder sb = new StringBuilder();
    sb.append("#EXTM3U\n");
    sb.append("#EXT-X-VERSION:7\n");
    sb.append("#EXT-X-INDEPENDENT-SEGMENTS\n");
    for (RenditionOutput r : renditions) {
      sb.append("#EXT-X-STREAM-INF:BANDWIDTH=").append(r.getBandwidthBps())
        .append(",AVERAGE-BANDWIDTH=").append(r.getAvgBandwidthBps())
        .append(",RESOLUTION=").append(r.getWidth()).append('x').append(r.getHeight())
        .append(",CODECS=\"").append(r.getCodecs()).append("\"")
        .append(",FRAME-RATE=").append(r.getFps())
        .append('\n');
      sb.append(r.getPlaylistPath()).append('\n');
    }
    return sb.toString();
  }

  public String buildMedia(List<HlsSegment> segments, int targetDurationSec) {
    StringBuilder sb = new StringBuilder();
    sb.append("#EXTM3U\n")
      .append("#EXT-X-VERSION:7\n")
      .append("#EXT-X-TARGETDURATION:").append(targetDurationSec).append('\n')
      .append("#EXT-X-MEDIA-SEQUENCE:0\n")
      .append("#EXT-X-PLAYLIST-TYPE:VOD\n");
    for (HlsSegment s : segments) {
      sb.append("#EXTINF:").append(String.format("%.3f", s.getDurationSec())).append(",\n");
      sb.append(s.getUri()).append('\n');
    }
    sb.append("#EXT-X-ENDLIST\n");
    return sb.toString();
  }
}

Key design decisions & trade-offs

Interview follow-ups

Related