Kafka Avro Schema: Backward Compatibility Explained for Developers

If you are working with Apache Kafka and someone mentions "Avro schema" or "Schema Registry," this guide is for you. We will cover what Avro is, why Kafka teams use it, what backward compatibility actually means in practice, and what changes to a schema will break your pipeline versus what is safe.

Avro schemas are JSON — work with them in your browser:

Generate JSON Schema → Validate a Schema →

What is Apache Avro?

Avro is a data serialization framework created by the Apache Hadoop project. It compresses data into a compact binary format for transport and stores the schema separately (usually in a Schema Registry). This is the opposite of JSON, which is self-describing — every message carries its own field names.

The tradeoffs look like this:

PropertyJSON in KafkaAvro in Kafka
Message sizeLarge — field names repeated every messageSmall — binary, schema stored separately
Schema enforcementNone — any JSON passesStrong — rejects messages that do not match schema
Schema evolutionManual, no toolingManaged — Schema Registry enforces compatibility
Human-readableYesNo — binary on the wire
Cross-language supportUniversalGood — Java, Python, Go, C#, and more

Kafka teams adopt Avro when they want schema enforcement, smaller message sizes at scale, and a formal contract between producers and consumers.

What an Avro schema looks like

An Avro schema is itself a JSON document. Here is a complete schema for a user event:

{
  "type": "record",
  "name": "UserCreated",
  "namespace": "com.example.events",
  "doc": "Fired when a new user account is created.",
  "fields": [
    {
      "name": "userId",
      "type": "string",
      "doc": "UUID of the new user account."
    },
    {
      "name": "email",
      "type": "string"
    },
    {
      "name": "createdAt",
      "type": "long",
      "logicalType": "timestamp-millis",
      "doc": "Unix timestamp in milliseconds."
    },
    {
      "name": "plan",
      "type": {
        "type": "enum",
        "name": "PlanType",
        "symbols": ["FREE", "PRO", "ENTERPRISE"]
      },
      "default": "FREE"
    }
  ]
}

Key things to notice:

Unions — the nullable field pattern

In Avro, a field is not nullable by default. To allow null, you use a union type. The conventional pattern is to put null first and set it as the default:

{
  "name": "middleName",
  "type": ["null", "string"],
  "default": null,
  "doc": "Optional middle name. null if not provided."
}

The order in a union matters: the default value must match the first type in the array. If you write ["string", "null"] with "default": null, Avro will reject the schema.

What is schema evolution?

Schema evolution is the process of changing an Avro schema over time without breaking existing producers or consumers. In a Kafka system, producers and consumers are deployed independently — different teams, different release cycles. A change to the schema must be handled carefully so that:

This is where backward, forward, and full compatibility come in.

Backward compatibility (most common)

Definition: A new schema version can read data written with the old schema version.

Typical scenario: You update consumers first (they get the new schema), then producers gradually roll out. During rollout, consumers receive messages encoded with the old schema and must handle them correctly.

What is safe (backward compatible)

// SAFE: Add a new field with a default value
// Old messages won't have this field — the default fills it in
{
  "name": "referralCode",
  "type": ["null", "string"],
  "default": null
}

// SAFE: Remove a field that already has a default
// Consumers won't find the field in the schema but can use the default
// (removing fields without defaults is NOT safe)

What breaks backward compatibility

// BREAKING: Add a new required field (no default)
// Old messages don't have this field — deserialization fails
{
  "name": "phoneNumber",
  "type": "string"
  // No "default" — this breaks old consumers reading old messages
}

// BREAKING: Remove a field that has no default
// Consumers expecting the field get an error

// BREAKING: Change a field's type (most cases)
// "userId" was "string", changed to "int" — deserialization fails

Forward compatibility

Definition: An old schema version can read data written with a new schema version.

Typical scenario: Producers are updated first and start sending new fields. Old consumers (not yet updated) receive these messages and must not crash — they just ignore unknown fields.

What is safe (forward compatible)

// SAFE: Add a new field (with or without default)
// Old consumers simply ignore the unknown field

// SAFE: Remove a field with a default
// Old consumers that expect the field can fall back to the default

What breaks forward compatibility

// BREAKING: Remove a field that old consumers require (no default)
// Old code that reads schema.fields.userId will fail

// BREAKING: Change a field type
// Old code expects "userId" to be a string, new data sends an int

Full compatibility

Full compatibility requires both backward AND forward compatibility simultaneously. This is the strictest mode and the hardest to maintain. In practice, the only completely safe operations under full compatibility are:

Any other change — renaming, type changes, removing required fields — breaks at least one direction.

Compatibility modes side-by-side

ChangeBackwardForwardFull
Add field with default✓ Safe✓ Safe✓ Safe
Add field without default✗ Breaks✓ Safe✗ Breaks
Remove field with default✓ Safe✓ Safe✓ Safe
Remove field without default✓ Safe*✗ Breaks✗ Breaks
Rename a field✗ Breaks✗ Breaks✗ Breaks
Change field type✗ Breaks✗ Breaks✗ Breaks
Widen type (int → long)✓ Safe✗ Breaks✗ Breaks

*Backward safe only if new consumers don't need the removed field.

Kafka Schema Registry

The Schema Registry (Confluent or AWS Glue SR) is a service that stores versioned schemas and enforces compatibility rules before a new schema is accepted. Producers must register their schema before publishing; the Registry responds with a schema ID that gets embedded in the message header.

# Register a schema (Confluent REST API)
curl -X POST \
  http://localhost:8081/subjects/user-created-value/versions \
  -H 'Content-Type: application/vnd.schemaregistry.v1+json' \
  -d '{
    "schema": "{\"type\":\"record\",\"name\":\"UserCreated\",\"fields\":[...]}"
  }'

# Response — schema accepted and assigned version 1
{ "id": 1 }

# If the new schema breaks compatibility, the API returns:
{ "error_code": 409, "message": "Schema being registered is incompatible with an earlier schema" }

The registry supports four compatibility settings per subject (topic):

Renaming a field safely with aliases

You cannot rename a field directly without breaking compatibility. The safe pattern uses the aliases array to map the old name to the new one. Avro readers will try the new name first, then fall back to the alias when reading old data.

// Old schema had: "name": "userId"
// New schema renames it to "accountId" using aliases

{
  "name": "accountId",
  "type": "string",
  "aliases": ["userId"],   // old readers look for "userId", find it under "accountId"
  "doc": "Renamed from userId in v3."
}

Note: aliases require the Schema Registry to be configured with BACKWARD or FULL mode to validate them correctly. Not all SR versions handle aliases transparently — test this in a staging environment before relying on it in production.

A real schema evolution example

Starting schema (v1):

{
  "type": "record",
  "name": "OrderPlaced",
  "namespace": "com.example.orders",
  "fields": [
    { "name": "orderId",    "type": "string" },
    { "name": "customerId", "type": "string" },
    { "name": "totalCents", "type": "long" }
  ]
}

v2 — Add currency support (backward safe):

{
  "type": "record",
  "name": "OrderPlaced",
  "namespace": "com.example.orders",
  "fields": [
    { "name": "orderId",    "type": "string" },
    { "name": "customerId", "type": "string" },
    { "name": "totalCents", "type": "long" },
    {
      "name": "currency",
      "type": { "type": "enum", "name": "Currency", "symbols": ["USD", "EUR", "GBP"] },
      "default": "USD",
      "doc": "Added in v2. Old messages default to USD."
    }
  ]
}

v3 — Add optional discount (backward safe):

{
  "name": "discountCents",
  "type": ["null", "long"],
  "default": null,
  "doc": "Discount applied in cents. null if no discount."
}

Each version added fields with defaults. Old consumers reading v1 messages still work because the new fields fall back to their defaults. New consumers reading v1 messages get currency = "USD" and discountCents = null automatically.

Working with Avro schemas using JSON tools

Because Avro schemas are JSON documents, you can use standard JSON tools to work with them:

Validate your Avro schema structure

Format, diff, and validate Avro schemas (which are JSON) directly in your browser — no installation, no data sent to a server.

JSON Formatter JSON Diff Schema Generator

Common mistakes and how to avoid them

1. Adding a required field to an existing schema

This is the most common breaking change. If you add a field without a default, old messages lack the field entirely. New consumers fail to deserialize them. The fix: always add a default, even if it is null via a union type.

2. Changing a field's type

Changing "type": "string" to "type": "int" is always breaking. Old messages contain a string value, new consumers try to read it as an integer, and deserialization fails. Avro allows type promotion (int → long, float → double) but nothing else.

3. Removing a field without checking its default

Removing a field with no default breaks forward compatibility. Old consumers that reference that field directly in code will now get null or throw a deserialization error, depending on the language library.

4. Using NONE compatibility mode in production

Setting NONE skips all compatibility checks. Any schema is accepted regardless of breaking changes. This is sometimes used in development for speed, but must never reach production for topics with multiple consumers.

5. Not testing deserialization of old messages

Even if the Schema Registry accepts the new schema, test that a message serialized with the old schema can be deserialized with the new one in your consumer code. Library behavior varies — some raise exceptions, others silently apply defaults. Always verify both directions in staging.

Frequently Asked Questions

What is an Avro schema in Kafka?

An Avro schema is a JSON document that describes the structure of messages on a Kafka topic. Producers serialize messages to binary using the schema; consumers deserialize them. Both sides reference the same schema version, usually stored in a Schema Registry. The result is smaller messages and a formal contract between producer and consumer teams.

What is backward compatibility in Avro?

Backward compatibility means a new schema version can read data written with the old version. In practice: add new fields with defaults, and you can remove fields that already have defaults. Never add a new field without a default — old messages won't contain it, and new consumers will fail to deserialize them.

What is the difference between backward and forward compatibility?

Backward: new readers can read old data. Forward: old readers can read new data. Full: both simultaneously. Backward is the most common requirement because you typically deploy consumers before producers. Forward is needed when producers are updated first.

Can you rename a field in an Avro schema?

Not directly — renaming breaks compatibility because Avro matches fields by name. The safe approach is to add an aliases array to the new field pointing to the old name. New readers find the field under the new name; old data is mapped using the alias. Test this in staging before relying on it.

What happens if you change a field type in Avro?

Most type changes break compatibility — old data contains one binary encoding, new readers expect another, and deserialization fails. Avro only allows safe type promotions: intlong, floatdouble, stringbytes. For any other type change, add a new field with the new type and deprecate the old field over time.

Ready to work with schemas? Generate one from a sample JSON message.

Open JSON Schema Generator →
About the author

Pasindu Ishan is a software developer based in Sri Lanka. He builds developer tools at JSON Dev Tools.