Source of Truth

Pattern

A reusable solution you can apply to your work.

Also known as: Single Source of Truth (SSOT), Authoritative Source

Understand This First

State – a source of truth is the authoritative location for specific state.
Database – the source of truth typically lives in a database.

Any system of meaningful size stores the same information in multiple places. A user’s email address might appear in the authentication database, the email service’s subscriber list, and the analytics platform. This is often unavoidable. But when those copies disagree (and they will), you need to know which one is right. The source of truth is the authoritative location where a given fact is defined and maintained. This is an architectural pattern because it determines how the system resolves contradictions.

Problem

When the same piece of information exists in multiple places and those places disagree, which one do you trust?

Without a designated source of truth, disagreements become permanent. One service says the user’s name is “Jane Smith.” Another says “Jane S. Smith.” A third says “J. Smith.” Nobody knows which is correct because nobody decided where the authoritative version lives. Updates get applied to whichever copy is convenient, and the system slowly drifts into incoherence.

Forces

Performance and availability push you to copy data closer to where it is needed (caching, replication, denormalization).
Every copy is a potential source of stale or conflicting information.
Different teams or services may each assume they own a piece of data.
Users expect the system to behave as if there is one coherent truth, even when the internals are distributed.

Solution

For every important piece of information, explicitly designate one system, one table, or one service as the source of truth. All other locations that hold that information are derived — they are caches, replicas, or projections that are populated from the source and refreshed on some schedule or trigger.

The rules are simple. Writes go to the source. If you need to change a user’s email, you change it in the source of truth. Reads prefer the source unless performance requires a cache, in which case the cache is understood to be potentially stale. Conflicts resolve in favor of the source. If the cache says one thing and the source says another, the source wins.

Document your sources of truth. A simple table (“user profile: users table in the auth database; product catalog: the products service; pricing: the pricing table in the billing database”) prevents months of confusion.

How It Plays Out

A company runs a marketing email platform and a customer support tool, both of which store customer email addresses. A customer updates their email through the support tool, but the marketing platform still has the old address. Emails bounce. The fix is to designate the authentication database as the source of truth for email addresses and have both the marketing platform and the support tool sync from it.

In an agentic workflow, the source of truth problem shows up constantly. An AI agent generating code might create a configuration value in both a config file and a constants module. Later, someone changes the config file but not the constants module. The system breaks in a way that is baffling until you realize there were two “sources” and they disagreed. Instructing the agent to “define this value in exactly one place and reference it everywhere else” is applying the source of truth pattern.

Tip

When directing an AI agent to build a system with multiple data stores (a database, a cache, a search index), explicitly state which store is the source of truth for each type of data. This prevents the agent from creating update paths that bypass the authoritative source.

Example Prompt

“The customer email address must be defined in exactly one place: the auth database. The marketing service and the support tool should both read from there. Don’t create a second copy of the email in either system.”

Consequences

A designated source of truth makes conflicts resolvable and debugging tractable. When data looks wrong, you know exactly where to check. It simplifies synchronization: every derived copy has a clear upstream to refresh from.

The cost is that funneling all writes through one system can create a bottleneck or a single point of failure. It also means accepting that derived copies may be temporarily out of date, which requires the rest of the system to tolerate staleness gracefully. The discipline of always writing to the source is easy to state but hard to maintain across a growing team, especially when a shortcut “just this once” creates a second write path.

Sources

Andy Hunt and Dave Thomas’s The Pragmatic Programmer (Addison-Wesley, 1999; 20th Anniversary 2nd ed. 2019) framed the underlying principle as DRY — “every piece of knowledge must have a single, unambiguous, authoritative representation within a system” — and the authors later clarified that DRY is about duplication of knowledge, not lines of code. Source of truth is the architectural application of that principle to data.
Bill Inmon’s Building the Data Warehouse (Wiley, 1992) established the data warehouse as the integrated, non-volatile repository that consolidates operational data into a “single version of the truth” — the lineage from which the modern phrase “single source of truth” descends. The phrase itself emerged communally from the data warehousing and master-data-management communities through the 1990s; no single coiner is on record.
E. F. Codd’s “A Relational Model of Data for Large Shared Data Banks” (Communications of the ACM, 1970) introduced the normalization theory that gives the source-of-truth pattern its formal grounding: redundancy is the enemy of consistency, and concentrating each fact in one place is the fix.

Keyboard shortcuts