Test Oracle

Pattern

A reusable solution you can apply to your work.

Context

You have a Test that runs your code and produces an output. Now you need to decide: is that output correct? The thing that answers this question is called an oracle. This is a tactical pattern that sits at the heart of every testing strategy.

Without an oracle, a test is just a program that runs code. It can tell you the code didn’t crash, but it can’t tell you the code did the right thing.

Problem

Knowing whether software produced the right answer is often harder than producing the answer in the first place. For simple functions (add two numbers, sort a list) the expected output is obvious. But for complex systems (a recommendation engine, a layout algorithm, a natural language response) defining “correct” is genuinely difficult. How do you establish a reliable source of truth for your tests?

Forces

Simple oracles (hardcoded expected values) are easy to write but only cover specific cases.
Complex systems produce outputs that are hard to verify precisely.
Some behaviors have multiple valid outputs, making exact comparison impossible.
The oracle itself can be wrong, creating false confidence.
Maintaining oracles adds cost as the system evolves.

Solution

Choose a source of truth appropriate to what you’re testing. The most common oracles, from simplest to most sophisticated:

Expected values. You hardcode the correct output for specific inputs. This is the bread and butter of unit testing: assert add(2, 3) == 5. Simple, clear, and fragile if the expected behavior changes.

Reference implementations. You compare your code’s output against a trusted alternative: a known-good library, a previous version, or a deliberately simple (but slow) implementation. This works well for algorithmic code where correctness is well-defined.

Property checks. Instead of checking for an exact value, you check that the output satisfies certain properties. “The sorted list has the same elements as the input” and “each element is less than or equal to the next” together define correctness for sorting without hardcoding any specific output.

Human judgment. For subjective or complex outputs (UI rendering, generated text, design choices) a human reviews the result and decides whether it’s acceptable. This doesn’t scale, but it’s sometimes the only honest oracle.

How It Plays Out

A team building a search engine can’t hardcode expected results for every query. Instead, they use property-based oracles: every returned result must contain the search term, results must be sorted by relevance score, and the top result must score above a threshold. These properties hold for any query, so the tests work even as the index changes.

In agentic coding, the oracle problem becomes acute. When an AI agent generates code, you need to verify the output. If you have a test suite with clear oracles (expected values, property checks, reference outputs) the agent can run the tests and self-correct. But if the only oracle is “a human reads the code and decides if it looks right,” the agent can’t iterate autonomously. Investing in machine-checkable oracles is what makes agentic workflows scalable.

Tip

When you can’t define an exact oracle, define properties. “The output is valid JSON,” “the response is under 200ms,” “the total matches the sum of the line items” — partial oracles still catch real bugs.

Example Prompt

“The search results can’t be hardcoded, so write property-based tests instead. Every returned result must contain the search term, results must be sorted by score descending, and the top result’s score must exceed 0.5.”

Consequences

A well-chosen oracle makes tests trustworthy. When a test fails, you know something is genuinely wrong, not just different. This trust is what makes a test suite valuable.

The risk is oracle rot: the oracle itself becomes outdated or wrong, and tests pass even when the code is broken. This is especially dangerous with hardcoded expected values that someone copy-pasted without verifying. Review your oracles as carefully as you review your code.

Enables: Test — every test needs an oracle.
Uses: Invariant — invariants are a form of property-based oracle.
Contrasts with: Observability — observability helps you understand behavior in production, while oracles verify behavior in testing.
Refined by: Fixture — fixtures provide the controlled inputs that make oracles deterministic.
Enables: Test-Driven Development — TDD depends on oracles to verify each step.

Keyboard shortcuts

Encyclopedia of Agentic Coding Patterns