← System Design Simulator

Encoding & Evolution

By Rahul Kumar · Senior Software Engineer · Updated · Category: Kleppmann · Designing Data-Intensive Applications

JSON / Protobuf / Avro size + schema compat. Ch 4.

This interactive explanation is built for system design interview prep: step through Encoding & Evolution, watch the internal state change, and connect the concept to real distributed-system trade-offs.

Overview

Every message leaving your process and every row going to disk is encoded: turned from an in-memory object graph into a sequence of bytes, then decoded on the other side. JSON, Protobuf, and Avro are the three encoding families that dominate backend systems, and each makes a different bet about schemas. JSON puts field names inside every payload, so the reader needs no prior agreement but pays for the metadata on every byte. Protobuf assigns a compact numeric tag per field and requires the reader to know the schema, trading self-description for density. Avro goes further: the payload contains no tags or names at all, and the writer's schema is shipped out-of-band, which makes the bytes brutally compact but tightly couples writer and reader to a schema registry. Kleppmann's key point is that encoding choice is really a schema-evolution choice — how you add, remove, and rename fields without breaking old clients.

Encoding & Evolution — Interactive Simulator

Runs fully client-side in your browser; no sign-up. Or open full screen →

Launch the interactive Encoding & Evolution widget — step through the algorithm or protocol and observe the internal state updating in real time.

How it works

At write time, JSON walks the object tree and emits key-value pairs as UTF-8 text, making the result human-readable but 3-10x larger than binary equivalents and slow to parse. Protobuf numbers each field in a .proto file; the encoder writes a tag byte packing field number and wire type, then the value, skipping any field that is unset. Unknown fields at read time are preserved as raw bytes, so old readers can round-trip new messages without losing data — the core trick that makes forward compatibility work. Avro is stricter: the writer's schema must accompany the data, either embedded in a file header or looked up by ID from a registry. The reader's schema may differ, and Avro resolves the two schemas at read time, applying default values for missing fields and dropping unknown ones. Schema evolution rules fall out of these mechanics: in Protobuf, new fields must be optional with unique tags and old fields must never be reused; in Avro, a reader using an older schema tolerates unknown fields as long as they have defaults in the newer writer schema. Getting the rules wrong — renaming a Proto field while keeping the tag, or removing a required Avro field with no default — creates silent data corruption that shows up weeks later.

Implementation

POJO with a new optional field (schema evolution)
import java.util.Objects;
import java.util.Optional;

/** V1 had {id, email}. V2 adds an optional phoneNumber without breaking V1 readers. */
public final class UserProfile {
    private final long id;
    private final String email;
    private final Optional<String> phoneNumber; // added in V2; V1 clients ignore it

    public UserProfile(long id, String email, Optional<String> phoneNumber) {
        this.id = id;
        this.email = Objects.requireNonNull(email);
        this.phoneNumber = phoneNumber == null ? Optional.empty() : phoneNumber;
    }

    /** Backward-compatible factory for pre-V2 callers. */
    public static UserProfile v1(long id, String email) {
        return new UserProfile(id, email, Optional.empty());
    }

    public long id() { return id; }
    public String email() { return email; }
    public Optional<String> phoneNumber() { return phoneNumber; }

    @Override
    public boolean equals(Object o) {
        if (!(o instanceof UserProfile u)) return false;
        return id == u.id && email.equals(u.email) && phoneNumber.equals(u.phoneNumber);
    }

    @Override
    public int hashCode() {
        return Objects.hash(id, email, phoneNumber);
    }
}
Protobuf .proto excerpt (evolvable schema)
syntax = "proto3";

package user.v2;

// Adding fields: always allocate a new tag number, never reuse retired ones.
// Removing fields: mark the tag as reserved so nobody re-uses it accidentally.
message UserProfile {
  int64 id = 1;
  string email = 2;

  // Added in V2. Old clients serialize messages without this field;
  // new servers parse those messages and see an empty phone_number.
  string phone_number = 3;

  // Field 4 was "legacy_username" — retired. Reserve so it cannot be re-used.
  reserved 4;
  reserved "legacy_username";
}
Protobuf SerDe: round-trip the POJO
import com.google.protobuf.InvalidProtocolBufferException;
import user.v2.UserProto;
import java.util.Optional;

public final class UserProfileSerde {

    /** Encode POJO to bytes. Unset phoneNumber becomes empty string on the wire. */
    public byte[] encode(UserProfile profile) {
        UserProto.UserProfile.Builder b = UserProto.UserProfile.newBuilder()
            .setId(profile.id())
            .setEmail(profile.email());
        profile.phoneNumber().ifPresent(b::setPhoneNumber);
        return b.build().toByteArray();
    }

    /** Decode bytes to POJO. Works on payloads from V1 (no phone field). */
    public UserProfile decode(byte[] bytes) throws InvalidProtocolBufferException {
        UserProto.UserProfile msg = UserProto.UserProfile.parseFrom(bytes);
        Optional<String> phone = msg.getPhoneNumber().isEmpty()
            ? Optional.empty()
            : Optional.of(msg.getPhoneNumber());
        return new UserProfile(msg.getId(), msg.getEmail(), phone);
    }

    /** Round-trip sanity check. */
    public boolean roundTrip(UserProfile original) throws InvalidProtocolBufferException {
        byte[] wire = encode(original);
        UserProfile decoded = decode(wire);
        return decoded.equals(original);
    }
}

Complexity

Key design decisions & trade-offs

Common pitfalls

Interview follow-ups

Recommended reading

Related