Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Data Model

Concept

A foundational idea to recognize and understand.

“All models are wrong, but some are useful.” — George Box

Understand This First

  • Requirement – the data model reflects what the system is required to know.

Context

Before you can store, transmit, or display information, you need to decide what information matters. A data model is the conceptual blueprint: which things exist, what properties they have, and how they relate to each other. It sits at the architectural level, above any particular database or programming language but below product-level decisions about what the system does.

If you’re building a bookstore application, the data model says there are books, authors, and orders. It says a book has a title and a price. It says an author can write many books. It doesn’t say whether you store this in PostgreSQL or a JSON file; that comes later. The data model captures meaning. Everything else captures mechanism.

Problem

How do you agree on what a system “knows about” before getting tangled in storage formats, code structures, and API designs?

Without a shared data model, different parts of the system evolve different ideas about what a “user” or an “order” contains. Fields get added in one place and forgotten in another. Conversations between developers (or between a human and an AI agent) become confusing because the same word means different things in different contexts.

Forces

  • You want the model to be complete enough to support current features, but simple enough to understand at a glance.
  • Real-world entities are messy; software models need clean boundaries.
  • The model must be stable enough to build on, yet flexible enough to evolve as requirements change.
  • Different stakeholders (designers, developers, business people) need to share the same vocabulary.

Solution

Define your data model explicitly and early. Identify the core entities (the nouns your system cares about), their attributes (the properties of each entity), and the relationships between them (how entities connect). Write it down, whether as a diagram, a list, or even a conversation, before you start coding.

A good data model acts as a shared language. When a product manager says “customer” and a developer says “user,” the data model settles the question: is it one concept or two? What fields does it carry? This clarity pays off enormously when directing an AI agent, because the agent can only generate correct code if it shares your understanding of the domain.

Keep the model at the right level of abstraction. You’re not designing database tables yet (that’s a Schema). You’re not choosing data types in code (that’s a Data Structure). You’re answering the question: what does this system know about the world?

How It Plays Out

A team building a recipe-sharing app sits down and lists the entities: Recipe, Ingredient, User, Rating. They sketch the relationships: a User creates Recipes, a Recipe has Ingredients, a User can leave a Rating on a Recipe. This ten-minute exercise prevents weeks of confusion later.

When directing an AI agent to build a feature, starting with the data model keeps the agent on track. Instead of saying “build me a recipe app,” you say: “Here is the data model — Recipe has a title, description, list of Ingredients, and an author (User). Generate the database schema and API endpoints for this model.” The agent now has concrete nouns and relationships to work from, and the code it produces will be internally consistent.

Tip

When you ask an AI agent to help design a system, ask it to produce the data model first. Review that before letting it generate any code. Catching a wrong entity or missing relationship at the model level is far cheaper than fixing it in code.

Example Prompt

“Before writing any code, design the data model for this recipe app. List the entities (Recipe, Ingredient, User, Rating), their fields, and the relationships between them. I’ll review the model before you generate the schema.”

Consequences

A clear data model gives every participant, human or AI, a shared vocabulary. It reduces miscommunication and makes code reviews faster because there’s a reference point for “what should exist.” It also makes it easier to evaluate whether a proposed change is small (adding an attribute) or large (introducing a new entity).

The cost is that data models take effort to maintain. As the product evolves, the model must evolve too, and an outdated model is worse than no model because it actively misleads. Models also force premature decisions if applied too rigidly; sometimes you need to build a prototype before you know what the right entities are.

  • Enables: Schema (Database) — a database schema is the data model made concrete in storage.
  • Enables: Schema (Serialization) — a serialization schema is the data model made concrete on the wire.
  • Enables: Data Structure — in-memory structures implement pieces of the data model in code.
  • Uses / Depends on: Requirement — the data model reflects what the system is required to know.
  • Refined by: Data Normalization / Denormalization — normalization refines how the model is physically organized.
  • Refined by: Domain Model — a domain model captures the broader business concepts and rules that a data model implements in storage.