DNS Resolution
Recursive walk: client → resolver → root → TLD → authoritative. TTL caching.
This interactive explanation is built for system design interview prep: step through DNS Resolution, watch the internal state change, and connect the concept to real distributed-system trade-offs.
Overview
DNS is the phone book of the internet and the slowest part of your first request. Every HTTP call starts with a DNS lookup that turns a hostname like api.example.com into an IP address. If that lookup is a cache miss, it can take 50-200 ms of round-trips across three or four servers before your TCP SYN even goes out. Understanding DNS — specifically recursive resolution, TTL caching, and the record types that back modern CDNs — is the difference between a p50 of 40 ms and a p50 of 300 ms. DNS is hierarchical: the root servers delegate to TLD servers (.com, .org), which delegate to authoritative servers for each zone (example.com). A recursive resolver walks this chain on your behalf and caches the result. Stale caches cause outages that look like network failures; short TTLs cause load on authoritative servers; getting the TTL right is a real trade-off your ops team argues about.
How it works
A typical resolution starts with the client OS checking its local DNS cache. On a miss, it asks the configured recursive resolver (often 8.8.8.8, 1.1.1.1, or the ISP's resolver). The resolver checks its own cache. On another miss, it asks a root server: who handles .com? The root returns the set of .com nameservers. The resolver asks one of them: who handles example.com? It returns the authoritative nameservers for example.com. The resolver asks one of those: what is the A record for api.example.com? The authoritative server returns the IP with a TTL. The resolver caches the record for TTL seconds and returns it to the client. Four round-trips, any of which can be saved by cache hits at any layer. Record types matter. A (IPv4) and AAAA (IPv6) resolve to addresses. CNAME aliases one name to another (used heavily by CDNs: static.example.com CNAME d1234.cloudfront.net). NS records delegate zones. MX points to mail servers. TXT holds arbitrary data (SPF, DKIM, domain verification). SRV is used by service discovery. TTL is the key operational knob: high TTL (24 h) is cheap but makes failover slow; low TTL (60 s) enables fast failover but triples authoritative query volume. Modern CDNs use very low TTLs on GeoDNS answers to steer traffic per-region. DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT) encrypt the query so the ISP cannot snoop or censor. EDNS Client Subnet (ECS) forwards a portion of the client IP to the authoritative server so GeoDNS can pick the closest edge.
Implementation
public class DnsResolver {
private static final class Entry {
final InetAddress addr; final long expiresAt;
Entry(InetAddress a, long t) { this.addr = a; this.expiresAt = t; }
}
private final ConcurrentHashMap<String, Entry> cache = new ConcurrentHashMap<>();
public InetAddress resolve(String host) throws IOException {
Entry e = cache.get(host);
long now = System.currentTimeMillis();
if (e != null && e.expiresAt > now) return e.addr;
ResolvedRecord r = recursiveLookup(host);
cache.put(host, new Entry(r.address, now + r.ttlSeconds * 1000L));
return r.address;
}
/** Stub for a full recursive walk: root -> TLD -> authoritative. */
private ResolvedRecord recursiveLookup(String host) throws IOException {
// 1. Ask a root server for the TLD nameservers.
InetAddress tldNs = queryNs(rootServer(), tldOf(host));
// 2. Ask the TLD for the authoritative nameservers.
InetAddress authNs = queryNs(tldNs, zoneOf(host));
// 3. Ask authoritative for the A record.
return queryA(authNs, host);
}
private InetAddress queryNs(InetAddress ns, String zone) { /* UDP to port 53 */ return null; }
private ResolvedRecord queryA(InetAddress ns, String host) { return null; }
private InetAddress rootServer() { return null; }
private String tldOf(String host) { int dot = host.lastIndexOf('.'); return host.substring(dot + 1); }
private String zoneOf(String host) { return host.substring(host.indexOf('.') + 1); }
record ResolvedRecord(InetAddress address, int ttlSeconds) {}
}
public class DnsDemo {
public static void main(String[] args) throws Exception {
// JDK's built-in resolver uses the OS resolver + cache.
long t0 = System.nanoTime();
InetAddress addr = InetAddress.getByName("api.example.com");
long ms = (System.nanoTime() - t0) / 1_000_000;
System.out.println(addr.getHostAddress() + " in " + ms + " ms");
// Warm lookup is dominated by cache, not network.
t0 = System.nanoTime();
InetAddress.getByName("api.example.com");
ms = (System.nanoTime() - t0) / 1_000_000;
System.out.println("warm: " + ms + " ms");
}
}
Complexity
- cold lookup RTT:
50-200 ms (3-4 hops) - warm lookup RTT:
< 1 ms (local cache) - TTL typical web:
300 s - 86400 s - TTL GeoDNS:
20-60 s
Key design decisions & trade-offs
- TTL length — Chosen: Short for failover-sensitive, long for stable services. Short TTL trades authoritative QPS for fast recovery; long TTL trades failover time for lower load.
- CNAME vs ALIAS/ANAME — Chosen: ALIAS at the apex, CNAME elsewhere. RFC forbids CNAME at a zone apex; ALIAS resolves recursively server-side.
- DoH/DoT — Chosen: Encrypt resolver queries when privacy matters. Prevents ISP snooping and censorship at the cost of slightly higher latency and TLS setup.
Common pitfalls
- JVM default DNS cache is 30 seconds (or infinite with SecurityManager) — override networkaddress.cache.ttl for cloud failover
- TTL of 0 does not mean no cache; intermediate resolvers often clamp to a minimum
- DNS amplification attacks via open resolvers — never run one
- Glue records required when authoritative nameservers live inside the zone they serve
Interview follow-ups
- Enable DNSSEC for authenticity of DNS answers
- Use GeoDNS + low TTL for multi-region traffic steering
- Monitor resolver latency via synthetic probes
- Switch to DoH/DoT at the client for privacy
Recommended reading
- Alex Petrov, Database Internals — storage engines and distributed systems internals.
- Martin Kleppmann, Designing Data-Intensive Applications (DDIA) — data models, replication, partitioning, consistency.
- The System Design Primer — high-level design building blocks.
- Foundational networking + web-security references (TCP/IP, TLS 1.3, OWASP Top 10).