Google Drive (Cloud File Storage) System Design Interview Question
Problem: Design a cloud file storage and sync service like Google Drive, Dropbox, or OneDrive.
Overview
Google Drive, Dropbox, and OneDrive look like file storage but are really sync engines on top of content-addressed blobs. The hard parts are not the upload itself; they are the delta computation that avoids shipping an entire 1 GB file when a user tweaks one paragraph, the multi-device fan-out that makes an edit appear on a phone within a second, and the version history that must be cheap even when a user rewrites the same doc a hundred times. The interview-grade answer splits files into fixed 4 MB blocks, hashes each block with SHA-256, deduplicates across users, and pushes only the block hashes the server has not already seen. A notification service keeps every signed-in device in a long-lived connection and broadcasts a 'refresh this file' ping the instant metadata commits. Below we walk the upload pipeline, the metadata model, and the sync-delta contract that ties everything together.
Summary
A sync-centric file-storage service built around content-addressed block storage. Per the book's final design, the upload path fans into three asynchronous flows: (1) the block server splits the file into 4 MB blocks, compresses + encrypts them, and uploads only the blocks whose SHA-256 hash isn't already present (delta sync + dedup); (2) the metadata service writes the new file row / chunk manifest to the metadata DB (with a Redis metadata cache in front of it); (3) the notification service pushes the change to every other device the user is signed into so sync happens in seconds. Storage is optimized by (a) block-level deduplication across users, (b) a cold-storage / offline-backup tier for old versions and rarely-accessed blocks, and (c) version-history limits (keep N recent versions, or time-based retention) so a single user's 100 rewrites of the same doc don't blow up storage forever.
Requirements
Functional
- Upload, download, rename, move, and delete files and folders
- Version history: fetch and restore any previous revision
- Multi-device sync within seconds of a commit
- Delta / changes-since-revision API for sync clients
- Sharing with per-user or per-link ACLs and collaborative edits
- Full-text and filename search across the user's corpus
- Offline edits that reconcile on reconnect
Non-functional
- 99.99% availability for metadata reads and writes
- Durability 11 nines for file blocks
- P99 sync notification delivery under 1 second
- Block-level dedup yielding at least 30% storage savings
- Scale to 100M DAU, 3 devices per user, 1 PB/day ingress
- Strong read-after-write consistency inside a single user's namespace
Capacity Assumptions
- 100M DAU, 10 file operations/user/day → 1B ops/day, ~70% reads / 30% writes
- Average file 1 MB, P99 file 1 GB; block size = 4 MB (fixed)
- 50B files lifetime, ~10 PB raw, but dedup ratio ~30% → ~7 PB physical after dedup
- Version history retention: last 10 versions OR 30 days, whichever is smaller — older versions drop to cold storage
- Cross-device sync: avg 3 devices per active user — notification service fans out on every commit
- Collaborative edits: ~1% of files shared, avg 3 collaborators
- Search index over filename + OCR'd content for ~5B files
Back-of-Envelope Estimates
- Write QPS: 300M writes/day / 86400 ≈ 3.5K QPS (peak 10K)
- Read QPS: 700M reads/day / 86400 ≈ 8K QPS (peak 25K)
- Upload bandwidth: 1 PB/day / 86400 ≈ 12 GB/s aggregate across regions
- Metadata DB size: 50B files * ~500B row + 50B * 250 blocks * 80B chunk ref ≈ 25 TB files + 1 PB block manifests → shard across ~100 SQL nodes
- Redis cache: hottest 10M files' metadata ≈ 50 GB — fits a small Redis cluster
- Notification connections: 100M DAU * 3 devices * 30% online = 90M active long-poll / WS connections
High-level architecture
The client is the engine of the design. When a file changes it re-chunks the local bytes into 4 MB blocks, hashes each block, and diffs the new block list against the previous revision stored locally. The resulting set of 'new' hashes goes to the block server, which asks the dedup index whether it already has each hash; only the truly novel blocks are PUT to cloud storage, compressed and encrypted at rest. Once all blocks land, the client calls the metadata service to commit a new revision: a row that pins file_id, revision_number, ordered block-hash list, size, mtime, and the committing device. The metadata DB is sharded by user_id so a single user's edits are serially ordered; a Redis metadata cache fronts the hot working set. A commit triggers two fan-outs. First, the notification service pushes a lightweight ping to every other device the user owns over a long-poll or WebSocket channel; each device then calls the changes API with its last-seen revision and pulls only the deltas. Second, the search indexer asynchronously picks up the new revision to refresh the filename and OCR index. Cold storage quietly migrates rarely-touched old-revision blocks to cheaper tiers, guarded by the version-history retention policy. ACLs, share links, and collaborator membership live in the metadata DB alongside files, so a single transactional commit updates both the revision and who can see it.
Architecture Components (12)
- Client (Desktop Sync / Web / Mobile) (client) — Desktop sync agent, web app, or mobile app that chunks files locally, runs delta sync, and holds a long-lived connection to the notification service for multi-device updates.
- Load Balancer (lb) — L7 HTTPS load balancer fronting the API tier (metadata service + signed-URL issuance).
- Drive API (Metadata Service) (api) — Stateless service that owns file / folder metadata, ACLs, revisions, and coordinates between the block servers, notification service, and search indexer. This is the book's metadata service.
- Metadata DB (sql) — Sharded relational store (MySQL / Spanner) for files, folders, permissions, and block manifests.
- Metadata Cache (cache) — Hot-metadata cache: file rows, permission checks, and the block-hash existence bitmap.
- Block Servers (worker) — The book's block-server tier: splits files into blocks, compresses, encrypts, and uploads blocks to cloud storage. Also answers the dedup-existence query and issues per-block signed URLs.
- Cloud Storage (Hot Blocks) (blob) — Content-addressed object store keyed by block SHA-256; the authoritative home of every active 4 MB block.
- Cold Storage / Offline Backup (blob) — Low-cost tier (S3 Glacier / GCS Archive) holding old versions, infrequently-accessed blocks, and the offline backup copy the book calls out for disaster recovery.
- Upload Queue (queue) — Kafka topic of commit events that drives async post-processing (indexing, thumbnails, malware scan, cold-tier migration decisions).
- Upload Worker (worker) — Consumer fleet that runs post-upload processing: thumbnail, malware scan, text extraction for search, and cold-tier migration.
- Notification Service (api) — Long-poll / WebSocket fan-out that keeps every signed-in device of a user (and every collaborator) in sync in near-real-time. This is the book's notification service — the thing that makes Drive feel instant across devices.
- Search Indexer (search) — Elasticsearch cluster indexing filenames, full-text (OCR'd), and metadata for in-Drive search.
Operations Walked Through (5)
- upload — Client splits a 12 MB file into three 4 MB blocks. Two blocks already exist globally (dedup hit) OR exist in a prior revision (delta sync hit), one is new. Only the new block is uploaded via the block server. Other devices are notified within ~1s.
- download — User opens a 12 MB PDF. API returns the manifest, client pulls 3 blocks in parallel directly from cloud storage (via the block server's signed URLs) and reassembles.
- multi-device-sync — User edits report.pdf on their laptop. Their phone + tablet (same account) are signed in with long-poll connections held open by the notification service. Within ~1s of commit, both pick up the new rev and fetch just the changed block.
- share — User shares report.pdf with alice@. API updates the ACL, notifies Alice via the notification service (WebSocket + mobile push + email), and re-indexes the file for her search scope.
- cold-tier — Background sweep: after a file accumulates more than N versions OR blocks age out of the active retention window, the orphaned blocks move from hot cloud storage to cold storage, reclaiming space without losing the ability to view old versions.
Implementation
@RestController
@RequestMapping("/v1/files/{fileId}/versions")
public class FileVersionController {
private final FileVersionService versions;
private final AclEnforcer acl;
public FileVersionController(FileVersionService v, AclEnforcer a) {
this.versions = v;
this.acl = a;
}
@GetMapping
public VersionListResponse list(
@PathVariable String fileId,
@AuthenticationPrincipal UserPrincipal user,
@RequestParam(defaultValue = "50") int limit) {
acl.requireRead(user.getId(), fileId);
return new VersionListResponse(versions.history(fileId, limit));
}
@GetMapping("/{revision}")
public FileVersion get(
@PathVariable String fileId,
@PathVariable long revision,
@AuthenticationPrincipal UserPrincipal user) {
acl.requireRead(user.getId(), fileId);
return versions.get(fileId, revision)
.orElseThrow(() -> new ResponseStatusException(HttpStatus.NOT_FOUND));
}
@PostMapping("/{revision}/restore")
public FileVersion restore(
@PathVariable String fileId,
@PathVariable long revision,
@AuthenticationPrincipal UserPrincipal user) {
acl.requireWrite(user.getId(), fileId);
return versions.restore(fileId, revision, user.getId());
}
}
public class BlockUploadService {
private static final int BLOCK_SIZE = 4 * 1024 * 1024;
private final BlockStore store;
private final DedupIndex dedup;
private final MetadataService metadata;
public CommitResult upload(String userId, String fileId, InputStream in, long size) throws IOException {
List<String> blockHashes = new ArrayList<>();
byte[] buf = new byte[BLOCK_SIZE];
MessageDigest digest;
try { digest = MessageDigest.getInstance("SHA-256"); }
catch (NoSuchAlgorithmException e) { throw new IllegalStateException(e); }
int read;
while ((read = readFully(in, buf)) > 0) {
digest.reset();
digest.update(buf, 0, read);
String hash = HexFormat.of().formatHex(digest.digest());
if (!dedup.exists(hash)) {
store.put(hash, buf, read);
dedup.record(hash, read);
}
blockHashes.add(hash);
}
long revision = metadata.commitRevision(userId, fileId, blockHashes, size);
return new CommitResult(fileId, revision, blockHashes.size());
}
private int readFully(InputStream in, byte[] buf) throws IOException {
int total = 0;
while (total < buf.length) {
int r = in.read(buf, total, buf.length - total);
if (r < 0) break;
total += r;
}
return total;
}
}
@RestController
@RequestMapping("/v1/sync")
public class SyncDeltaController {
private final ChangeLog changeLog;
@GetMapping("/changes")
public ChangesResponse changes(
@AuthenticationPrincipal UserPrincipal user,
@RequestParam("since") long sinceRevision,
@RequestParam(defaultValue = "500") int limit) {
if (limit <= 0 || limit > 1000) {
throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "limit out of range");
}
List<ChangeEntry> entries = changeLog.tail(user.getId(), sinceRevision, limit);
long nextCursor = entries.isEmpty()
? sinceRevision
: entries.get(entries.size() - 1).getRevision();
boolean hasMore = entries.size() == limit;
return new ChangesResponse(entries, nextCursor, hasMore);
}
public static final class ChangeEntry {
private final long revision;
private final String fileId;
private final ChangeType type; // CREATED, UPDATED, DELETED, RENAMED, ACL_CHANGED
private final List<String> addedBlocks;
private final List<String> removedBlocks;
private final Instant timestamp;
public ChangeEntry(long r, String f, ChangeType t, List<String> a, List<String> rm, Instant ts) {
this.revision = r; this.fileId = f; this.type = t;
this.addedBlocks = a; this.removedBlocks = rm; this.timestamp = ts;
}
public long getRevision() { return revision; }
public String getFileId() { return fileId; }
public ChangeType getType() { return type; }
public List<String> getAddedBlocks() { return addedBlocks; }
public List<String> getRemovedBlocks() { return removedBlocks; }
public Instant getTimestamp() { return timestamp; }
}
}
Key design decisions & trade-offs
- Block size for chunking — Chosen: Fixed 4 MB blocks. Simple to reason about, cheap to index, and good enough dedup for typical office files. Content-defined chunking would dedup better for appended logs but complicates the client.
- Metadata store — Chosen: Sharded SQL sharded by user_id with a Redis cache. A user's edits need serializable ordering for revisions; SQL gives that trivially. Sharding by user_id keeps cross-shard transactions out of the hot path. Redis absorbs hot-file reads.
- Sync transport — Chosen: Long-poll or WebSocket notification service with pull-based delta fetch. Push only a tiny 'something changed' signal; let the client pull with its last-known revision. Keeps the push fabric stateless and handles reconnects correctly.
- Version retention — Chosen: Last N revisions or time-based, older tiered to cold storage. A user rewriting the same doc 100 times would otherwise 100x the storage for that file. Trades instant restore latency for older versions for huge cost savings.
- Dedup scope — Chosen: Global block dedup across users. Same viral PDF shared by a million users is stored once. Requires per-block refcounting and careful GC; gains ~30% storage overall.
Interview follow-ups
- How would you add real-time collaborative editing (Google Docs CRDT) on top of this?
- How do you build Drive-style selective sync that only materializes files on demand?
- How would you enforce per-folder quotas and per-user storage caps transactionally?
- How do you handle end-to-end encryption while preserving dedup?
- How would you ship conflict resolution for simultaneous offline edits?