Hotel Reservation System System Design Interview Question
Problem: Design a hotel reservation system (Booking.com / Expedia / direct hotel chain) that takes rooms from search, through hold, payment, and confirmed booking, without overselling.
Overview
A hotel reservation system looks small on paper and is unforgiving in practice. The traffic numbers are tiny by modern internet standards, roughly 50 booking attempts per second at peak, but the correctness bar is absolute: you may never sell the same room twice. Two users clicking Book on the last room at the same moment must result in one confirmation and one polite rejection, not two charges and an angry guest at reception. The design leans on three classical moves: a short-lived Redis hold with TTL so the user has time to type a card number, a strongly consistent SQL inventory table that is the source of truth, and network-level idempotency keys so retries on flaky mobile links do not double-book. The interesting debates are whether to use SERIALIZABLE isolation or optimistic locking with a version column, whether holds belong in Redis or in the database, and how aggressively to cache the search path without serving stale availability.
Summary
A consistency-critical booking pipeline: search is read-heavy and can be served from a denormalized availability cache; booking is write-critical and must never oversell. The design uses Redis for short-lived room holds (SETNX with TTL) plus idempotency keys, a strongly-consistent SQL database (MySQL / Postgres / Aurora) for the authoritative reservation + inventory tables, and an external payment provider (Stripe-style) with network-grade idempotency. The dominant design choice is 'hold then confirm' over optimistic 'book directly' — holds keep inventory reserved for 10 min during checkout so the user can enter payment details without the room disappearing or double-selling. The main tradeoff is locked but unpaid inventory during the hold window; bookable-room utilization is slightly reduced in exchange for a clean UX and a hard no-oversell guarantee.
Requirements
Functional
- Search hotels and room types by destination, dates, and guest count
- Place a short-lived hold on a specific room while the user enters payment
- Confirm a booking atomically with a successful payment authorization
- Support idempotent POST /bookings via client-supplied Idempotency-Key
- Cancel and refund bookings subject to the hotel cancellation policy
- Email and push confirmation to the guest and notification to the hotel
Non-functional
- No oversell under any concurrency pattern; correctness is a hard constraint
- Search p95 under 500 ms, hold p95 under 200 ms, booking p95 under 2 s (payment-bound)
- Availability 99.95% for search, 99.99% for the booking pipeline
- Hold TTL 10 minutes with a one-click extend option
- Reservation DB survives a full AZ failure without losing committed bookings
Capacity Assumptions
- 1M hotels globally, avg 200 rooms each → 200M room-nights inventory
- 30M searches/day, 1M booking attempts/day, 900K successful bookings/day
- Search:Book ratio ≈ 30:1 (read-heavy)
- Avg checkout session 4 min; hold TTL = 10 min (buffer for payment)
- No-oversell is a hard constraint — correctness > availability
- Payment SLA: external provider p95 ≈ 600ms, p99 ≈ 1.5s
Back-of-Envelope Estimates
- Search QPS: 30M / 86400 ≈ 350 sustained, ~1.2K peak
- Booking QPS: 1M / 86400 ≈ 12 sustained, ~50 peak — tiny absolute numbers, the challenge is correctness not scale
- Active holds at any moment: 50 QPS * 10 min = 30K concurrent holds max
- Reservation DB row count: 900K/day * 365 * 5yr ≈ 1.6B rows (partitioned by check_in month)
- Redis footprint: 30K holds * ~1 KB ≈ 30 MB — trivial
High-level architecture
The flow is search, then hold, then confirm. Search hits a denormalized availability cache derived from the inventory table by a background projection job; it is eventually consistent and that is fine, because the authoritative availability check happens at hold time. When the user clicks a room, the API issues a Redis SETNX with a 10-minute TTL on a key shaped like hold:{hotel_id}:{room_type}:{date_range}; if it succeeds, the user holds that room. The hold token is returned to the client and displayed as a countdown. At confirm time the Reservation API runs a transaction against the inventory table under SERIALIZABLE isolation, decrements the room count, and writes a row to the reservations table; if the transaction commits, the API calls the external payment provider with the client-supplied Idempotency-Key. The payment call is outside the SQL transaction because payment provider p99 is 1.5 seconds and holding a SERIALIZABLE transaction open that long would destroy throughput. If the payment fails after the SQL commit, a compensating transaction releases the room and the Redis hold. The trick that makes this safe is that the SQL row itself doubles as a semantic lock: SERIALIZABLE plus a UNIQUE constraint on (hotel_id, room_id, date) prevents two concurrent transactions from both decrementing inventory below zero. For high-contention room types we also use optimistic locking with a version column as a second line of defense, so a retry loop converts a serialization failure into a user-visible 409 instead of a stuck transaction.
Architecture Components (9)
- Client (Web / Mobile) (client) — Browser or mobile app that drives search → room detail → checkout → confirmation.
- Load Balancer (lb) — L7 HTTPS load balancer in front of the reservation API tier.
- Reservation API (api) — Stateless orchestrator that composes search, inventory, payment, and notifications into the booking workflow.
- Search Service (search) — Elasticsearch-backed availability search indexed nightly from the authoritative inventory.
- Inventory Service (api) — Authoritative service managing room-type inventory per hotel per date. Owns the decrement/release logic.
- Reservation DB (SQL) (sql) — Strongly-consistent SQL database holding bookings, inventory, and audit trail.
- Redis (Holds + Idempotency) (cache) — In-memory store for short-lived room holds and idempotency-key result cache.
- Payment Provider (external) (api) — Third-party payment processor (Stripe / Adyen / Braintree) charging the guest's card.
- Notification Service (worker) — Async worker that reads the outbox table and sends email / SMS / push confirmations.
Operations Walked Through (4)
- search — User searches for hotels; the API queries the search index and returns price + availability cards.
- book — Happy-path booking: client creates a hold, then POST /bookings with an Idempotency-Key, API charges payment, commits DB, releases hold, and enqueues notification.
- cancel — User cancels before check-in; API refunds via payment provider, releases the inventory rows, and notifies the hotel.
- book-payment-declined — Alternate booking flow where the payment provider declines the card. API releases the hold and caches the failed idempotency-key result so retries are honest.
Implementation
public final class ReservationService {
private final DataSource ds;
private final RedisHoldStore holds;
private final PaymentClient payment;
public ReservationService(DataSource ds, RedisHoldStore holds, PaymentClient payment) {
this.ds = ds; this.holds = holds; this.payment = payment;
}
public BookingResult book(BookingRequest req) throws SQLException {
if (!holds.ownsHold(req.holdToken(), req.userId())) {
return BookingResult.holdExpired();
}
// 1. Reserve inventory under SERIALIZABLE isolation.
String reservationId;
try (Connection c = ds.getConnection()) {
c.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
c.setAutoCommit(false);
try {
reservationId = reserveRoomTx(c, req);
c.commit();
} catch (SQLException e) {
c.rollback();
if (isSerializationFailure(e)) return BookingResult.conflictRetry();
throw e;
}
}
// 2. Charge payment OUTSIDE the SQL transaction (payment p99 ~1.5s).
PaymentResult pr = payment.charge(req.idempotencyKey(), req.amount(), req.card());
if (!pr.success()) {
compensate(reservationId);
return BookingResult.paymentDeclined(pr.reason());
}
holds.release(req.holdToken());
return BookingResult.confirmed(reservationId);
}
private String reserveRoomTx(Connection c, BookingRequest req) throws SQLException {
// Row-level semantics: UNIQUE(hotel_id, room_id, date) prevents double-sell.
try (PreparedStatement ps = c.prepareStatement(
"INSERT INTO reservations(id, hotel_id, room_id, user_id, check_in, check_out, status) " +
"VALUES (?,?,?,?,?,?, 'PENDING')")) {
String id = UUID.randomUUID().toString();
ps.setString(1, id); ps.setLong(2, req.hotelId()); ps.setLong(3, req.roomId());
ps.setString(4, req.userId()); ps.setDate(5, Date.valueOf(req.checkIn()));
ps.setDate(6, Date.valueOf(req.checkOut()));
ps.executeUpdate();
return id;
}
}
private boolean isSerializationFailure(SQLException e) {
return "40001".equals(e.getSQLState()); // Postgres serialization_failure
}
private void compensate(String reservationId) { /* mark cancelled, restore inventory */ }
}
public final class RedisHoldStore {
private final Jedis jedis;
private static final Duration HOLD_TTL = Duration.ofMinutes(10);
public RedisHoldStore(Jedis jedis) { this.jedis = jedis; }
// Returns a hold token if the lock was acquired, empty if someone else holds it.
public Optional<String> tryHold(long hotelId, long roomId, LocalDate checkIn,
LocalDate checkOut, String userId) {
String key = holdKey(hotelId, roomId, checkIn, checkOut);
String token = UUID.randomUUID().toString();
String res = jedis.set(key, userId + ":" + token,
SetParams.setParams().nx().px(HOLD_TTL.toMillis()));
return "OK".equals(res) ? Optional.of(token) : Optional.empty();
}
public boolean ownsHold(String token, String userId) {
// Scan by token -> user; real impl stores reverse index at hold time.
String raw = jedis.get("holdtoken:" + token);
return raw != null && raw.startsWith(userId + ":");
}
public boolean extend(String token) {
return jedis.pexpire("holdtoken:" + token, HOLD_TTL.toMillis()) == 1;
}
public void release(String token) { jedis.del("holdtoken:" + token); }
private static String holdKey(long hid, long rid, LocalDate in, LocalDate out) {
return "hold:" + hid + ":" + rid + ":" + in + ":" + out;
}
}
public final class InventoryRepo {
private final DataSource ds;
public InventoryRepo(DataSource ds) { this.ds = ds; }
// Returns true if the decrement succeeded, false if the version moved under us.
public boolean decrementAvailable(long hotelId, long roomTypeId, LocalDate date,
int expectedVersion) throws SQLException {
String sql = "UPDATE inventory SET available = available - 1, version = version + 1 " +
"WHERE hotel_id = ? AND room_type_id = ? AND date = ? " +
"AND version = ? AND available > 0";
try (Connection c = ds.getConnection();
PreparedStatement ps = c.prepareStatement(sql)) {
ps.setLong(1, hotelId); ps.setLong(2, roomTypeId);
ps.setDate(3, Date.valueOf(date)); ps.setInt(4, expectedVersion);
return ps.executeUpdate() == 1;
}
}
// Retry wrapper: up to N attempts, re-reading version each loop.
public void reserveWithRetry(long hotelId, long roomTypeId, LocalDate date, int maxAttempts)
throws SQLException, OutOfInventoryException {
for (int attempt = 0; attempt < maxAttempts; attempt++) {
InventoryRow row = load(hotelId, roomTypeId, date);
if (row.available <= 0) throw new OutOfInventoryException();
if (decrementAvailable(hotelId, roomTypeId, date, row.version)) return;
}
throw new SQLException("optimistic retry exhausted");
}
private InventoryRow load(long hotelId, long roomTypeId, LocalDate date) throws SQLException {
/* select available, version from inventory where ... */
return new InventoryRow(1, 0);
}
private record InventoryRow(int available, int version) {}
public static final class OutOfInventoryException extends Exception {}
}
Key design decisions & trade-offs
- Holds in Redis with TTL vs holds as rows in the inventory DB — Chosen: Redis SETNX with 10 min TTL. Hold traffic is dominated by expirations and short-lived keys. TTL on Redis is O(1) and kernel-driven; implementing TTL in the inventory DB requires a sweeper and writes amplify under peak booking load. Correctness still lives in the SQL transaction, so Redis is an advisory lock.
- Isolation level for the inventory transaction — Chosen: SERIALIZABLE with short transactions. Booking volume is low (50 peak QPS), so the throughput cost of SERIALIZABLE is negligible; the correctness benefit is that overlapping transactions cannot both see available > 0 and both commit. Payment is kept outside the transaction to avoid holding locks across a 1.5 s external call.
- Second line of defense for oversell: row-lock vs optimistic version — Chosen: Optimistic locking via version column with retry loop. For high-demand room types where many requests race on the same row, SELECT ... FOR UPDATE serializes them and tanks throughput. An optimistic version column with a bounded retry loop converts contention into user-visible 409 Conflict and keeps the database hot path lock-free.
- Idempotency key scope — Chosen: Client-generated UUID scoped to a single hold token. Tying the key to hold_token rather than request-id means a retry of the same booking attempt returns the same result, but editing a card after a decline rotates the key. This prevents both double-charges and stuck 402 replays from a 24 h cache.
- Search staleness — Chosen: Eventually consistent search from a denormalized cache, authoritative check at hold. Search is 30x more traffic than booking. Serving it from the source of truth would waste DB capacity on an operation that the user is not committing to yet. Authoritative check moves to hold time where it belongs.
Interview follow-ups
- How do you prevent a malicious client from holding every room in a hotel to block a competitor?
- How do you support group bookings that need multiple rooms as an atomic unit?
- How do you roll out a schema change to the inventory table without halting bookings?
- How would you implement overbooking on purpose (airline-style) while bounding the risk of walk-outs?
- How do you reconcile with the hotel's on-site PMS system when the network partition heals?