Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Verification Loop

Pattern

A reusable solution you can apply to your work.

Understand This First

  • Agent – the verification loop is the agent’s primary quality assurance mechanism.
  • Tool – the agent needs tools to run tests and read results.

Context

At the agentic level, the verification loop is the cycle of change, test, inspect, and iterate that makes agentic coding reliable. It’s the mechanism by which an agent confirms that its changes actually work, not through confidence, but through evidence.

The verification loop is what separates agentic coding from “generate and hope.” A model generates plausible code, but plausible isn’t correct. The loop closes the gap by running tests, checking output, and feeding results back to the agent for correction.

Problem

How do you ensure that agent-generated changes actually work, when the agent’s default output is optimized for plausibility rather than correctness?

An agent that writes code without verifying it is like a developer who never runs their tests. The code might be right. It often is. But when it isn’t, the errors compound: the next change builds on a broken foundation, and the agent doesn’t notice because it isn’t checking.

Forces

  • Agent confidence doesn’t correlate with correctness. The model sounds equally sure about right and wrong code.
  • Fast iteration is one of the agent’s strengths, making verify-and-retry cheap.
  • Test infrastructure must exist for verification to work. The loop is only as good as the checks it runs.
  • Verification scope must be calibrated. Running the full test suite after every small change is wasteful; running nothing is reckless.

Solution

Build verification into the agent’s workflow as a mandatory step, not an optional one. The basic loop is:

  1. Change. The agent modifies code based on the task or the previous iteration’s feedback.
  2. Test. The agent runs relevant tests, linters, type checks, or other automated checks.
  3. Inspect. The agent reads the results. If everything passes, the task may be complete. If something fails, the agent analyzes the failure.
  4. Iterate. The agent uses the failure information to make a corrective change and returns to step 2.

Steps 2-4 are what the agent does naturally when given access to test tools and trained to use them. Most capable agents, when told “fix this and make sure the tests pass,” will automatically run tests, read failures, and iterate. Your job is to ensure the infrastructure exists and the agent knows how to invoke it.

Verification works at multiple granularities. Unit tests catch functional errors quickly. Type checkers catch structural errors. Linters catch style violations and common mistakes. Integration tests catch issues at boundaries. A good verification loop uses the fastest checks first and escalates to slower, broader checks as the change stabilizes.

Warning

Don’t trust agent-generated tests as your only verification. An agent can write code and tests that agree with each other while both being wrong. Use existing tests, human-written tests, and manual inspection as anchors. See Smell (AI Smell) for more on this failure mode.

How It Plays Out

An agent is asked to add input validation to an API endpoint. It writes the validation logic, runs the existing test suite, and discovers that two tests fail because they were sending invalid input that the old code silently accepted. The agent examines the tests, determines they should be updated to send valid input, makes the corrections, reruns the suite, and all tests pass. Without the verification loop, the validation would have shipped alongside broken tests.

A developer configures their agent’s harness to automatically run type checks after every file save. The agent writes a function that returns string | null but the caller expects string. The type checker catches the mismatch immediately, and the agent adds a null check before moving on. The bug never reaches a test; it was caught at the fastest verification level.

Example Prompt

“Add input validation to the /register endpoint. After writing the code, run the full test suite. If any test fails, read the failure output and fix the issue. Repeat until all tests pass.”

Consequences

The verification loop makes agentic coding reliable. It catches errors while the agent still has the context to fix them, reducing the chance that broken code reaches code review or production. It also builds a healthy habit: treat agent output as a hypothesis to be tested, not a fact to be trusted.

The cost is infrastructure. You need tests, linters, type checkers, and a way for the agent to invoke them. Projects with weak test coverage get less benefit from the verification loop because there are fewer checks to run. This creates a virtuous cycle: the more you invest in test infrastructure, the more productive your agents become.

  • Depends on: Agent — the verification loop is the agent’s primary quality assurance mechanism.
  • Depends on: Tool — the agent needs tools to run tests and read results.
  • Uses: Plan Mode — planning produces expectations that verification can check against.
  • Enables: Eval — evals are verification loops applied to the agent’s overall performance.
  • Refined by: Human in the Loop — some verification steps require human judgment.
  • Uses: Smell (AI Smell) — AI smell detection is a form of verification that automated tools can’t yet perform.