Serialization
Also known as: Marshalling, Encoding
Understand This First
- Data Structure – serialization converts data structures into a portable format.
- Data Model – the data model determines what gets serialized.
Context
Data inside a running program lives in Data Structures (objects, structs, arrays) that only make sense to that specific program in that specific language on that specific machine. The moment you need to send data over a network, save it to a file, store it in a database, or pass it to another process, you must convert those in-memory structures into a sequence of bytes or text that can travel and be reconstructed on the other side. That conversion is serialization. The reverse, converting bytes back into in-memory structures, is deserialization. This is an architectural pattern because it governs every boundary where data enters or leaves a process.
Problem
How do you convert a program’s in-memory data into a portable format that other programs, other machines, or future versions of the same program can reconstruct?
In-memory data structures are tied to a specific language, runtime, and memory layout. A Python dictionary and a Java HashMap might represent the same information, but their internal representations are completely different. Without serialization, data can’t cross any boundary: not a network socket, not a file, not even the gap between two programs on the same machine.
Forces
- Human-readable formats (JSON, YAML, XML) are easy to inspect and debug but verbose and slow to parse.
- Binary formats (Protocol Buffers, MessagePack, CBOR) are compact and fast but opaque. You can’t read them in a text editor.
- The format must handle the data types you actually use: dates, nested objects, arrays, nulls, large numbers.
- Serialization must be paired with deserialization, and the two must agree on the format. Otherwise data is lost or corrupted.
- Versioning matters: the format must tolerate changes as the data model evolves over time.
Solution
Choose a serialization format based on your requirements, then use it consistently across the boundary.
JSON is the most common choice for web APIs and configuration files. It is human-readable, universally supported, and good enough for most purposes. Its main limitations are lack of a date type, no comments, and verbosity for large payloads.
Protocol Buffers (protobuf) and similar binary formats are the choice when performance matters — microservice-to-microservice communication, high-throughput data pipelines, or bandwidth-constrained environments. They require a Schema (Serialization) defined upfront, which also serves as documentation and enables code generation.
CBOR and MessagePack are binary formats that closely mirror JSON’s data model but are more compact and faster to parse. They are useful when you want JSON’s flexibility with better performance.
Whatever format you choose, use a well-tested library rather than writing serialization code by hand. Hand-written serializers are a rich source of bugs (off-by-one errors, missing escaping, incorrect handling of special characters) that established libraries have already solved.
How It Plays Out
A web application receives a form submission as JSON, deserializes it into an in-memory object, processes it, serializes the result as JSON, and sends it back to the browser. This serialize-deserialize cycle happens on every request. The developer never writes serialization code by hand — the web framework handles it using a JSON library.
An AI agent asked to “save user preferences to a file” might produce code that writes a custom text format: name=Alice;theme=dark;fontSize=14. This works initially but becomes fragile as the data grows more complex (what if a value contains a semicolon?). Instructing the agent to “serialize as JSON” produces code that handles edge cases correctly because the JSON library already deals with escaping, nesting, and special characters.
When working with AI agents, always specify the serialization format explicitly. “Serialize as JSON” or “use Protocol Buffers with this schema” prevents agents from inventing ad hoc formats that will break as the data evolves.
“Save user preferences to a JSON file. Don’t invent a custom format — use the standard JSON library so we get proper escaping and nested structure support for free.”
Consequences
Serialization makes data portable. It can travel across networks, persist to disk, and be consumed by programs written in any language. A well-chosen format and a standard library handle edge cases (escaping, encoding, nested structures) that would be painful to get right by hand.
The costs include the CPU time for serialization and deserialization (usually negligible for JSON, significant for very high-throughput systems), the need to choose and commit to a format early, and the complexity of versioning. When the data model changes, when a field is added, renamed, or removed, the serialization format must accommodate the change without breaking existing consumers. This is where a Schema (Serialization) provides real value, by defining the rules for forward and backward compatibility.
Related Patterns
- Uses / Depends on: Data Structure — serialization converts data structures into a portable format.
- Enables: Schema (Serialization) — a serialization schema governs the format and compatibility rules.
- Contrasts with: Schema (Database) — database schemas define storage shape; serialization defines transmission shape.
- Enables: Idempotency — deterministic serialization can help with deduplication and caching.
- Uses / Depends on: Data Model — the data model determines what gets serialized.