If you are working with Apache Kafka and someone mentions "Avro schema" or "Schema Registry," this guide is for you. We will cover what Avro is, why Kafka teams use it, what backward compatibility actually means in practice, and what changes to a schema will break your pipeline versus what is safe.
Avro schemas are JSON — work with them in your browser:
What is Apache Avro?
Avro is a data serialization framework created by the Apache Hadoop project. It compresses data into a compact binary format for transport and stores the schema separately (usually in a Schema Registry). This is the opposite of JSON, which is self-describing — every message carries its own field names.
The tradeoffs look like this:
| Property | JSON in Kafka | Avro in Kafka |
|---|---|---|
| Message size | Large — field names repeated every message | Small — binary, schema stored separately |
| Schema enforcement | None — any JSON passes | Strong — rejects messages that do not match schema |
| Schema evolution | Manual, no tooling | Managed — Schema Registry enforces compatibility |
| Human-readable | Yes | No — binary on the wire |
| Cross-language support | Universal | Good — Java, Python, Go, C#, and more |
Kafka teams adopt Avro when they want schema enforcement, smaller message sizes at scale, and a formal contract between producers and consumers.
What an Avro schema looks like
An Avro schema is itself a JSON document. Here is a complete schema for a user event:
{
"type": "record",
"name": "UserCreated",
"namespace": "com.example.events",
"doc": "Fired when a new user account is created.",
"fields": [
{
"name": "userId",
"type": "string",
"doc": "UUID of the new user account."
},
{
"name": "email",
"type": "string"
},
{
"name": "createdAt",
"type": "long",
"logicalType": "timestamp-millis",
"doc": "Unix timestamp in milliseconds."
},
{
"name": "plan",
"type": {
"type": "enum",
"name": "PlanType",
"symbols": ["FREE", "PRO", "ENTERPRISE"]
},
"default": "FREE"
}
]
}
Key things to notice:
type: "record"is the top-level type for structured data (equivalent to a JSON object).namespacehelps avoid naming conflicts across teams.- Every field has a
nameand atype. Types can be primitives (string,long,int,boolean,float,double,bytes,null), complex types (record,array,map,enum,fixed), or unions. - Fields with a
defaultare optional during deserialization — critical for schema evolution.
Unions — the nullable field pattern
In Avro, a field is not nullable by default. To allow null, you use a union type. The conventional pattern is to put null first and set it as the default:
{
"name": "middleName",
"type": ["null", "string"],
"default": null,
"doc": "Optional middle name. null if not provided."
}
The order in a union matters: the default value must match the first type in the array. If you write ["string", "null"] with "default": null, Avro will reject the schema.
What is schema evolution?
Schema evolution is the process of changing an Avro schema over time without breaking existing producers or consumers. In a Kafka system, producers and consumers are deployed independently — different teams, different release cycles. A change to the schema must be handled carefully so that:
- Old consumers can still read new messages from updated producers.
- New consumers can still read old messages that are already in Kafka (retention can be days or weeks).
This is where backward, forward, and full compatibility come in.
Backward compatibility (most common)
Definition: A new schema version can read data written with the old schema version.
Typical scenario: You update consumers first (they get the new schema), then producers gradually roll out. During rollout, consumers receive messages encoded with the old schema and must handle them correctly.
What is safe (backward compatible)
// SAFE: Add a new field with a default value
// Old messages won't have this field — the default fills it in
{
"name": "referralCode",
"type": ["null", "string"],
"default": null
}
// SAFE: Remove a field that already has a default
// Consumers won't find the field in the schema but can use the default
// (removing fields without defaults is NOT safe)
What breaks backward compatibility
// BREAKING: Add a new required field (no default)
// Old messages don't have this field — deserialization fails
{
"name": "phoneNumber",
"type": "string"
// No "default" — this breaks old consumers reading old messages
}
// BREAKING: Remove a field that has no default
// Consumers expecting the field get an error
// BREAKING: Change a field's type (most cases)
// "userId" was "string", changed to "int" — deserialization fails
Forward compatibility
Definition: An old schema version can read data written with a new schema version.
Typical scenario: Producers are updated first and start sending new fields. Old consumers (not yet updated) receive these messages and must not crash — they just ignore unknown fields.
What is safe (forward compatible)
// SAFE: Add a new field (with or without default)
// Old consumers simply ignore the unknown field
// SAFE: Remove a field with a default
// Old consumers that expect the field can fall back to the default
What breaks forward compatibility
// BREAKING: Remove a field that old consumers require (no default)
// Old code that reads schema.fields.userId will fail
// BREAKING: Change a field type
// Old code expects "userId" to be a string, new data sends an int
Full compatibility
Full compatibility requires both backward AND forward compatibility simultaneously. This is the strictest mode and the hardest to maintain. In practice, the only completely safe operations under full compatibility are:
- Adding a new field with a default value.
- Removing a field that has a default value.
Any other change — renaming, type changes, removing required fields — breaks at least one direction.
Compatibility modes side-by-side
| Change | Backward | Forward | Full |
|---|---|---|---|
| Add field with default | ✓ Safe | ✓ Safe | ✓ Safe |
| Add field without default | ✗ Breaks | ✓ Safe | ✗ Breaks |
| Remove field with default | ✓ Safe | ✓ Safe | ✓ Safe |
| Remove field without default | ✓ Safe* | ✗ Breaks | ✗ Breaks |
| Rename a field | ✗ Breaks | ✗ Breaks | ✗ Breaks |
| Change field type | ✗ Breaks | ✗ Breaks | ✗ Breaks |
| Widen type (int → long) | ✓ Safe | ✗ Breaks | ✗ Breaks |
*Backward safe only if new consumers don't need the removed field.
Kafka Schema Registry
The Schema Registry (Confluent or AWS Glue SR) is a service that stores versioned schemas and enforces compatibility rules before a new schema is accepted. Producers must register their schema before publishing; the Registry responds with a schema ID that gets embedded in the message header.
# Register a schema (Confluent REST API)
curl -X POST \
http://localhost:8081/subjects/user-created-value/versions \
-H 'Content-Type: application/vnd.schemaregistry.v1+json' \
-d '{
"schema": "{\"type\":\"record\",\"name\":\"UserCreated\",\"fields\":[...]}"
}'
# Response — schema accepted and assigned version 1
{ "id": 1 }
# If the new schema breaks compatibility, the API returns:
{ "error_code": 409, "message": "Schema being registered is incompatible with an earlier schema" }
The registry supports four compatibility settings per subject (topic):
BACKWARD(default) — new schema must be readable by old consumers.FORWARD— new schema must be writable by old producers.FULL— both directions.NONE— no compatibility checks. Dangerous in production.
Renaming a field safely with aliases
You cannot rename a field directly without breaking compatibility. The safe pattern uses the aliases array to map the old name to the new one. Avro readers will try the new name first, then fall back to the alias when reading old data.
// Old schema had: "name": "userId"
// New schema renames it to "accountId" using aliases
{
"name": "accountId",
"type": "string",
"aliases": ["userId"], // old readers look for "userId", find it under "accountId"
"doc": "Renamed from userId in v3."
}
Note: aliases require the Schema Registry to be configured with BACKWARD or FULL mode to validate them correctly. Not all SR versions handle aliases transparently — test this in a staging environment before relying on it in production.
A real schema evolution example
Starting schema (v1):
{
"type": "record",
"name": "OrderPlaced",
"namespace": "com.example.orders",
"fields": [
{ "name": "orderId", "type": "string" },
{ "name": "customerId", "type": "string" },
{ "name": "totalCents", "type": "long" }
]
}
v2 — Add currency support (backward safe):
{
"type": "record",
"name": "OrderPlaced",
"namespace": "com.example.orders",
"fields": [
{ "name": "orderId", "type": "string" },
{ "name": "customerId", "type": "string" },
{ "name": "totalCents", "type": "long" },
{
"name": "currency",
"type": { "type": "enum", "name": "Currency", "symbols": ["USD", "EUR", "GBP"] },
"default": "USD",
"doc": "Added in v2. Old messages default to USD."
}
]
}
v3 — Add optional discount (backward safe):
{
"name": "discountCents",
"type": ["null", "long"],
"default": null,
"doc": "Discount applied in cents. null if no discount."
}
Each version added fields with defaults. Old consumers reading v1 messages still work because the new fields fall back to their defaults. New consumers reading v1 messages get currency = "USD" and discountCents = null automatically.
Working with Avro schemas using JSON tools
Because Avro schemas are JSON documents, you can use standard JSON tools to work with them:
- Format a schema for readability — paste a minified schema into the JSON Formatter to pretty-print it with proper indentation before committing to version control.
- Generate a schema from a sample — if you have a sample JSON message, use the JSON Schema Generator to auto-generate a draft JSON Schema, then adapt it to Avro format by replacing JSON Schema types with Avro types.
- Validate schema syntax — the JSON Schema Validator can validate that an Avro schema document is structurally correct JSON before you submit it to the Schema Registry.
- Compare schema versions — paste two versions of a schema into the JSON Diff tool to see exactly what changed between v1 and v2.
Validate your Avro schema structure
Format, diff, and validate Avro schemas (which are JSON) directly in your browser — no installation, no data sent to a server.
Common mistakes and how to avoid them
1. Adding a required field to an existing schema
This is the most common breaking change. If you add a field without a default, old messages lack the field entirely. New consumers fail to deserialize them. The fix: always add a default, even if it is null via a union type.
2. Changing a field's type
Changing "type": "string" to "type": "int" is always breaking. Old messages contain a string value, new consumers try to read it as an integer, and deserialization fails. Avro allows type promotion (int → long, float → double) but nothing else.
3. Removing a field without checking its default
Removing a field with no default breaks forward compatibility. Old consumers that reference that field directly in code will now get null or throw a deserialization error, depending on the language library.
4. Using NONE compatibility mode in production
Setting NONE skips all compatibility checks. Any schema is accepted regardless of breaking changes. This is sometimes used in development for speed, but must never reach production for topics with multiple consumers.
5. Not testing deserialization of old messages
Even if the Schema Registry accepts the new schema, test that a message serialized with the old schema can be deserialized with the new one in your consumer code. Library behavior varies — some raise exceptions, others silently apply defaults. Always verify both directions in staging.
Frequently Asked Questions
What is an Avro schema in Kafka?
An Avro schema is a JSON document that describes the structure of messages on a Kafka topic. Producers serialize messages to binary using the schema; consumers deserialize them. Both sides reference the same schema version, usually stored in a Schema Registry. The result is smaller messages and a formal contract between producer and consumer teams.
What is backward compatibility in Avro?
Backward compatibility means a new schema version can read data written with the old version. In practice: add new fields with defaults, and you can remove fields that already have defaults. Never add a new field without a default — old messages won't contain it, and new consumers will fail to deserialize them.
What is the difference between backward and forward compatibility?
Backward: new readers can read old data. Forward: old readers can read new data. Full: both simultaneously. Backward is the most common requirement because you typically deploy consumers before producers. Forward is needed when producers are updated first.
Can you rename a field in an Avro schema?
Not directly — renaming breaks compatibility because Avro matches fields by name. The safe approach is to add an aliases array to the new field pointing to the old name. New readers find the field under the new name; old data is mapped using the alias. Test this in staging before relying on it.
What happens if you change a field type in Avro?
Most type changes break compatibility — old data contains one binary encoding, new readers expect another, and deserialization fails. Avro only allows safe type promotions: int → long, float → double, string → bytes. For any other type change, add a new field with the new type and deprecate the old field over time.
Ready to work with schemas? Generate one from a sample JSON message.
Open JSON Schema Generator →