← System Design Simulator

Google Drive (Cloud File Storage) System Design Interview Question

By Rahul Kumar · Senior Software Engineer · Updated · 12 components · 5 operations ·Source: Alex Xu, System Design Interview Vol 1, Chapter 15

Problem: Design a cloud file storage and sync service like Google Drive, Dropbox, or OneDrive.

Overview

Google Drive, Dropbox, and OneDrive look like file storage but are really sync engines on top of content-addressed blobs. The hard parts are not the upload itself; they are the delta computation that avoids shipping an entire 1 GB file when a user tweaks one paragraph, the multi-device fan-out that makes an edit appear on a phone within a second, and the version history that must be cheap even when a user rewrites the same doc a hundred times. The interview-grade answer splits files into fixed 4 MB blocks, hashes each block with SHA-256, deduplicates across users, and pushes only the block hashes the server has not already seen. A notification service keeps every signed-in device in a long-lived connection and broadcasts a 'refresh this file' ping the instant metadata commits. Below we walk the upload pipeline, the metadata model, and the sync-delta contract that ties everything together.

Google Drive (Cloud File Storage) — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive walkthrough for Google Drive (Cloud File Storage) — animated architecture diagram, step-by-step flow with real payloads, component swap, and a discrete-event stress simulator.

Summary

A sync-centric file-storage service built around content-addressed block storage. Per the book's final design, the upload path fans into three asynchronous flows: (1) the block server splits the file into 4 MB blocks, compresses + encrypts them, and uploads only the blocks whose SHA-256 hash isn't already present (delta sync + dedup); (2) the metadata service writes the new file row / chunk manifest to the metadata DB (with a Redis metadata cache in front of it); (3) the notification service pushes the change to every other device the user is signed into so sync happens in seconds. Storage is optimized by (a) block-level deduplication across users, (b) a cold-storage / offline-backup tier for old versions and rarely-accessed blocks, and (c) version-history limits (keep N recent versions, or time-based retention) so a single user's 100 rewrites of the same doc don't blow up storage forever.

Requirements

Functional

Non-functional

Capacity Assumptions

Back-of-Envelope Estimates

High-level architecture

The client is the engine of the design. When a file changes it re-chunks the local bytes into 4 MB blocks, hashes each block, and diffs the new block list against the previous revision stored locally. The resulting set of 'new' hashes goes to the block server, which asks the dedup index whether it already has each hash; only the truly novel blocks are PUT to cloud storage, compressed and encrypted at rest. Once all blocks land, the client calls the metadata service to commit a new revision: a row that pins file_id, revision_number, ordered block-hash list, size, mtime, and the committing device. The metadata DB is sharded by user_id so a single user's edits are serially ordered; a Redis metadata cache fronts the hot working set. A commit triggers two fan-outs. First, the notification service pushes a lightweight ping to every other device the user owns over a long-poll or WebSocket channel; each device then calls the changes API with its last-seen revision and pulls only the deltas. Second, the search indexer asynchronously picks up the new revision to refresh the filename and OCR index. Cold storage quietly migrates rarely-touched old-revision blocks to cheaper tiers, guarded by the version-history retention policy. ACLs, share links, and collaborator membership live in the metadata DB alongside files, so a single transactional commit updates both the revision and who can see it.

Architecture Components (12)

Operations Walked Through (5)

Implementation

FileVersionController
@RestController
@RequestMapping("/v1/files/{fileId}/versions")
public class FileVersionController {
  private final FileVersionService versions;
  private final AclEnforcer acl;

  public FileVersionController(FileVersionService v, AclEnforcer a) {
    this.versions = v;
    this.acl = a;
  }

  @GetMapping
  public VersionListResponse list(
      @PathVariable String fileId,
      @AuthenticationPrincipal UserPrincipal user,
      @RequestParam(defaultValue = "50") int limit) {
    acl.requireRead(user.getId(), fileId);
    return new VersionListResponse(versions.history(fileId, limit));
  }

  @GetMapping("/{revision}")
  public FileVersion get(
      @PathVariable String fileId,
      @PathVariable long revision,
      @AuthenticationPrincipal UserPrincipal user) {
    acl.requireRead(user.getId(), fileId);
    return versions.get(fileId, revision)
        .orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND));
  }

  @PostMapping("/{revision}/restore")
  public FileVersion restore(
      @PathVariable String fileId,
      @PathVariable long revision,
      @AuthenticationPrincipal UserPrincipal user) {
    acl.requireWrite(user.getId(), fileId);
    return versions.restore(fileId, revision, user.getId());
  }
}
Chunked upload with content-hash dedup
public class BlockUploadService {
  private static final int BLOCK_SIZE = 4 * 1024 * 1024;

  private final BlockStore store;
  private final DedupIndex dedup;
  private final MetadataService metadata;

  public CommitResult upload(String userId, String fileId, InputStream in, long size) throws IOException {
    List<String> blockHashes = new ArrayList<>();
    byte[] buf = new byte[BLOCK_SIZE];
    MessageDigest digest;
    try { digest = MessageDigest.getInstance("SHA-256"); }
    catch (NoSuchAlgorithmException e) { throw new IllegalStateException(e); }

    int read;
    while ((read = readFully(in, buf)) > 0) {
      digest.reset();
      digest.update(buf, 0, read);
      String hash = HexFormat.of().formatHex(digest.digest());
      if (!dedup.exists(hash)) {
        store.put(hash, buf, read);
        dedup.record(hash, read);
      }
      blockHashes.add(hash);
    }
    long revision = metadata.commitRevision(userId, fileId, blockHashes, size);
    return new CommitResult(fileId, revision, blockHashes.size());
  }

  private int readFully(InputStream in, byte[] buf) throws IOException {
    int total = 0;
    while (total < buf.length) {
      int r = in.read(buf, total, buf.length - total);
      if (r < 0) break;
      total += r;
    }
    return total;
  }
}
Sync delta API (changes since revision N)
@RestController
@RequestMapping("/v1/sync")
public class SyncDeltaController {
  private final ChangeLog changeLog;

  @GetMapping("/changes")
  public ChangesResponse changes(
      @AuthenticationPrincipal UserPrincipal user,
      @RequestParam("since") long sinceRevision,
      @RequestParam(defaultValue = "500") int limit) {
    if (limit <= 0 || limit > 1000) {
      throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "limit out of range");
    }
    List<ChangeEntry> entries = changeLog.tail(user.getId(), sinceRevision, limit);
    long nextCursor = entries.isEmpty()
        ? sinceRevision
        : entries.get(entries.size() - 1).getRevision();
    boolean hasMore = entries.size() == limit;
    return new ChangesResponse(entries, nextCursor, hasMore);
  }

  public static final class ChangeEntry {
    private final long revision;
    private final String fileId;
    private final ChangeType type; // CREATED, UPDATED, DELETED, RENAMED, ACL_CHANGED
    private final List<String> addedBlocks;
    private final List<String> removedBlocks;
    private final Instant timestamp;

    public ChangeEntry(long r, String f, ChangeType t, List<String> a, List<String> rm, Instant ts) {
      this.revision = r; this.fileId = f; this.type = t;
      this.addedBlocks = a; this.removedBlocks = rm; this.timestamp = ts;
    }
    public long getRevision() { return revision; }
    public String getFileId() { return fileId; }
    public ChangeType getType() { return type; }
    public List<String> getAddedBlocks() { return addedBlocks; }
    public List<String> getRemovedBlocks() { return removedBlocks; }
    public Instant getTimestamp() { return timestamp; }
  }
}

Key design decisions & trade-offs

Interview follow-ups

Related