Digital Wallet System Design Interview Question
Problem: Design a digital wallet (PayPal/Venmo-style) that supports top-up from bank, ACID peer-to-peer transfers, and fast balance reads.
Overview
A digital wallet looks deceptively like a bank account: a user has a balance, they top up from a bank, they send money to someone else, they withdraw. The deception is that every one of those operations touches at least two balances atomically, and the entire product collapses the moment balances drift even by a cent. Unlike a payment system that brokers between a customer and an external PSP, a wallet owns the money on both sides of most transactions, which means the wallet service is the source of truth for its own ledger — there is no external settlement report to reconcile against for intra-wallet transfers. That concentrates every correctness concern into one place: how do you guarantee that a transfer from Alice to Bob either fully succeeds or fully fails, even when the process crashes between the debit and the credit, and how do you serve a billion balance reads per day without asking Postgres for a SUM every time. The design below answers both with a double-entry ledger plus a materialized balance row protected by optimistic locking.
Summary
A strongly consistent wallet built on a MySQL double-entry ledger, fronted by a Redis balance cache and a 2-phase-commit transfer coordinator that guarantees A→B money movements are atomic across two account shards. The dominant design choice is putting the canonical balance in ACID SQL (not in Redis) and using 2PC when sender and receiver live on different shards — accepting the ~30ms latency cost in exchange for zero reconciliation debt. Kafka is the audit spine: every state transition produces an immutable event that downstream fraud, analytics, and compliance systems consume independently.
Requirements
Functional
- Top up a wallet from a linked bank account or card via an external rail (ACH, Stripe, UPI)
- Transfer funds between two wallets atomically — either both sides move or neither does
- Withdraw funds to a linked bank account with a hold period before the external rail settles
- Serve a low-latency balance read (p99 < 20ms) for the app home screen
- Maintain an immutable double-entry ledger of every money movement for audit
- Enforce per-user KYC and per-transaction velocity limits before authorizing a transfer
Non-functional
- ACID guarantees on transfers — no lost updates, no phantom balances, no partial writes
- Strong consistency on the write path; read-your-writes on the balance read path
- p99 write latency under 200ms, p99 read latency under 20ms
- Durability: zero ledger row loss; 7-year retention for regulatory audit
- Availability 99.95% writes, 99.99% reads (balance read is served from replicas)
- Horizontally scalable to 100M users without re-architecting the ledger
Capacity Assumptions
- 500M accounts, 50M DAU
- 10M P2P transfers/day, peak 3x (~350 TPS peak)
- 200M balance reads/day, peak 10x (~23K read QPS)
- Cache hit rate target: ≥ 99% on balance reads
- SOX + PCI + regional banking regs → 7-year audit retention on every event
Back-of-Envelope Estimates
- Ledger writes: 2 entries/transfer + 2 per top-up ≈ 30M rows/day, ~350 write TPS peak
- Ledger storage: 30M * 365 * 7 * 250B ≈ 19 TB over 7 years — sharded MySQL by account_id
- Redis balance cache: 500M accounts * ~80B ≈ 40 GB; a 3-shard cluster with 3 replicas each
- Kafka audit log: 30M events/day * ~500B ≈ 15 GB/day, 38 TB over 7 years (cold tier after 30d)
- Bank connector throughput bounded by partner bank (typical 50–200 TPS per partner, retries are expensive)
High-level architecture
The wallet service sits behind an API gateway that authenticates users and rate-limits per-UID. Every write — top-up, transfer, withdrawal — flows through a WalletService that opens a single SERIALIZABLE transaction against Postgres. Inside the transaction we (a) read-lock the involved wallet_balance rows in a stable order (lowest account_id first, to prevent deadlocks), (b) insert balanced double-entry rows into the ledger_entries table, and (c) apply the delta to the materialized wallet_balances.balance_minor column using optimistic concurrency (a version column + WHERE version = ? CAS). If two transfers race on the same wallet, one commits and the other retries at the service layer — bounded to three retries before returning 409. Top-ups are two-phase: we record a pending ledger entry immediately and a balance_pending column bumps, then the bank rail webhook (or settlement poll) flips the entry to posted and moves the amount from balance_pending to balance_available. This gives users instant visual feedback while respecting rail finality. Balance reads hit a thin read API that goes to a read replica for wallet_balances — we never SUM the ledger on the read path because that would O(N) per read. A separate audit job runs nightly and recomputes SUM(ledger_entries) per wallet, comparing against the materialized balance; any drift pages on-call. Sharding is by user_id hash so a transfer between two users routes through a coordinator that uses 2PC or a saga across shards; we keep popular users (merchants) unsharded on a hot tier to avoid cross-shard writes for most transfers.
Architecture Components (10)
- Client (Mobile / Web) (client) — PayPal/Venmo-style mobile app or web client.
- Load Balancer (lb) — L7 HTTPS load balancer across stateless wallet API replicas.
- Wallet API (api) — Stateless REST/gRPC API for transfers, top-ups, and balance reads.
- Auth Service (auth) — Validates OAuth2 tokens + device binding + step-up challenges for high-value transfers.
- Balance Service (api) — Read path for account balances. Reads cache first, falls through to the ledger on miss.
- Redis Balance Cache (cache) — Sharded Redis cluster holding hot account balances.
- 2PC Transfer Coordinator (coordinator) — Coordinates the A→B transfer across (possibly) two MySQL shards using two-phase commit + a durable coordinator log.
- Ledger (MySQL, Double-Entry, Sharded) (sql) — Authoritative double-entry ledger. MySQL InnoDB sharded by account_id. XA transactions used by the coordinator.
- Bank Connector (api) — Outbound gateway to partner banks (ACH, SEPA, FedNow). Rate-limited per partner and retried carefully.
- Kafka Event Log (stream) — Append-only audit spine. Every state transition publishes an immutable event.
Operations Walked Through (3)
- p2p-transfer — Alice sends $100 to Bob. Accounts live on different MySQL shards, triggering the 2PC path. Either both balance updates commit or neither does.
- top-up — User pulls $500 from their linked bank into the wallet. Wallet calls the Bank Connector, records pending, then settles on bank ack.
- balance-read — Ultra-hot path — every screen refresh hits this. 99% served from Redis.
Implementation
@Service
public class WalletService {
private final WalletRepo wallets;
private final LedgerRepo ledger;
@Transactional(isolation = Isolation.SERIALIZABLE)
public TransferResult transfer(UUID from, UUID to, long amountMinor, String currency, String idemKey) {
if (amountMinor <= 0) throw new IllegalArgumentException("amount must be positive");
// lock in canonical order to avoid deadlock
UUID first = from.compareTo(to) < 0 ? from : to;
UUID second = first.equals(from) ? to : from;
Wallet w1 = wallets.loadForUpdate(first);
Wallet w2 = wallets.loadForUpdate(second);
Wallet src = first.equals(from) ? w1 : w2;
Wallet dst = first.equals(to) ? w1 : w2;
if (src.available() < amountMinor) throw new InsufficientFundsException(from);
UUID txnId = UUID.randomUUID();
ledger.insert(txnId, from, -amountMinor, currency, idemKey);
ledger.insert(txnId, to, amountMinor, currency, idemKey);
int u1 = wallets.applyDelta(from, -amountMinor, src.version());
int u2 = wallets.applyDelta(to, amountMinor, dst.version());
if (u1 != 1 || u2 != 1) throw new OptimisticLockException("wallet version changed");
return new TransferResult(txnId, src.version() + 1, dst.version() + 1);
}
}
@Service
public class BalanceReadService {
private final JdbcTemplate replicaJdbc; // points at read replica
public BalanceView read(UUID walletId) {
return replicaJdbc.queryForObject(
"SELECT wallet_id, currency, balance_available_minor, balance_pending_minor, version, updated_at " +
"FROM wallet_balances WHERE wallet_id = ?",
(rs, i) -> new BalanceView(
UUID.fromString(rs.getString("wallet_id")),
rs.getString("currency"),
rs.getLong("balance_available_minor"),
rs.getLong("balance_pending_minor"),
rs.getLong("version"),
rs.getTimestamp("updated_at").toInstant()),
walletId);
}
/** Read-your-writes: after a write, the client can pass the expected version and we redirect to primary if replica is behind. */
public BalanceView readWithVersion(UUID walletId, long minVersion) {
BalanceView v = read(walletId);
if (v.version() < minVersion) return primaryRead(walletId);
return v;
}
}
@Service
public class TopUpService {
private final WalletRepo wallets;
private final LedgerRepo ledger;
private final BankRailClient rail;
@Transactional
public TopUpReceipt initiate(UUID walletId, long amountMinor, String currency, String idemKey) {
BankTransfer pending = rail.initiate(walletId, amountMinor, currency, idemKey);
UUID txnId = UUID.randomUUID();
ledger.insertPending(txnId, walletId, amountMinor, currency, pending.railRef(), idemKey);
wallets.bumpPending(walletId, amountMinor);
return new TopUpReceipt(txnId, pending.railRef(), TopUpStatus.PENDING);
}
@Transactional
public void onRailSettled(String railRef, boolean success) {
LedgerRow row = ledger.findByRailRef(railRef);
if (row.status() != LedgerStatus.PENDING) return; // idempotent
if (success) {
ledger.markPosted(row.txnId());
wallets.movePendingToAvailable(row.walletId(), row.amountMinor());
} else {
ledger.markFailed(row.txnId());
wallets.clearPending(row.walletId(), row.amountMinor());
}
}
}
Key design decisions & trade-offs
- Double-entry ledger vs balance-only accounting — Chosen: Double-entry: every transfer is two rows (debit + credit) in ledger_entries. Balance-only makes bug-induced drift invisible — if the credit side silently drops, nobody notices until someone sums the system. Double-entry makes the invariant SUM(ledger_entries) = 0 across the system, which the nightly audit job checks. Cost is 2x the rows, but ledgers are cheap compared to the cost of losing trust.
- Materialized balance vs SUM-on-read — Chosen: Materialized wallet_balances row updated inside the same transaction as the ledger. SUM-on-read is O(ledger_rows) per balance check — fine at 100 users, catastrophic at 100M. Materialized balance is O(1) but can drift if a bug writes the ledger without updating the balance. We prevent drift by keeping both writes in one transaction and auditing nightly.
- SERIALIZABLE isolation vs optimistic CAS — Chosen: SERIALIZABLE for the critical transfer path, optimistic retry at the service layer on conflict. SERIALIZABLE prevents write skew (e.g., two concurrent transfers both passing the balance check before either writes) that READ COMMITTED allows. The cost is retries under contention; the service layer catches serialization failures and retries up to 3 times before returning 409 — which caps tail latency.
- Strong consistency vs eventual consistency across shards — Chosen: Strong consistency within a shard, 2PC or saga across shards. Most transfers are within-shard (both users hash to the same shard) and use a single local SERIALIZABLE transaction. Cross-shard transfers use a saga: reserve funds on the source shard, commit on the destination, release or compensate. This avoids 2PC's coordinator-failure pitfalls at the cost of a visible pending state for a few seconds.
- Instant top-up UX vs rail finality — Chosen: Two-phase: expose a pending balance immediately, flip to available after rail settlement. ACH can take 1-3 days to settle; users won't wait. Two-phase balance (available + pending) lets us show the money while gating actual spend until settlement. The risk is a failed settlement requiring a reversal — we mitigate with per-user top-up velocity limits and a fraud model that blocks high-risk new accounts from spending pending funds.
Interview follow-ups
- Cross-border wallets with FX-rate lock and multi-currency balance per user
- Card-issuing: virtual and physical debit cards that draw from the wallet via a card network (Visa, Marqeta)
- Peer-to-peer payment requests with a pending-request state and expiration
- Regulatory holds for AML/sanctions screening before a transfer clears
- Sharded ledger with cross-shard saga coordinator and automatic rebalancing on hot users
- Offline-first mobile client with conflict-resolved pending transfers