Code Mode

Pattern

A reusable solution you can apply to your work.

Instead of showing an agent every tool’s schema and having it emit JSON calls one step at a time, give it a small API and let it write code that calls those tools inside a sandbox.

Also known as: Code-Mode MCP, Code Execution with MCP, Tools as Code.

Understand This First

MCP (Model Context Protocol) – the tool-exchange protocol that Code Mode restructures.
Tool – the callable capability being wrapped.
Sandbox – where the model’s generated code actually runs.
Context Window – the bounded working memory the pattern conserves.
Context Rot – the failure mode Code Mode mitigates at scale.

At the agentic level, a modern agent can connect to hundreds or thousands of tools through MCP servers. Each tool comes with a name, a description, and an input schema, and the agent’s harness loads these definitions into the context window so the model knows what is available. For small tool sets this is fine. For an enterprise surface with a few thousand endpoints, it doesn’t stay fine for long.

The classic MCP loop works like a phone call: the agent picks one tool, emits a JSON call, waits for the full response to come back through the model, reads it, picks the next tool. Every intermediate result passes through the context window. Every decision costs a round trip. When the model needs to join five API responses, filter the result, and keep only the three rows that matter, it must ferry all of that data through its own brain.

Code Mode sits at the boundary between the harness and the tool layer. It asks a different question: what if the agent wrote a short program instead of a sequence of JSON calls? That’s the whole idea.

Problem

How do you give an agent access to a large surface of tools without drowning it in schemas, without piping every intermediate result back through the model, and without losing the ability to compose multiple calls into a single coherent step?

The classic tool-use pattern breaks down at scale. Thousands of tool schemas eat a huge fraction of the context window before the agent has done any work. Raw API responses piped back through the model turn a 150,000-token payload into 150,000 tokens of context rot. And a single logical action — fetch orders, fetch customers, join them, filter by date, return the top three — costs five full round trips through the model, each with its own opportunity for the agent to wander off.

Forces

Context economics. Every tool schema and every intermediate response competes for space with the agent’s actual working memory. Schemas alone can cost over a million tokens on realistic enterprise surfaces.
Model skill asymmetry. Modern models are markedly better at writing code than at composing long chains of step-by-step JSON tool calls. Training corpora have more code than tool-call transcripts.
Composition and filtering. Most useful work is not a single tool call. It is fetch, join, filter, reduce. Forcing that through one-call-per-turn is expensive and brittle.
Safety and auditability. Running model-written code is a different risk profile than running discrete, pre-audited tool calls. The sandbox becomes load-bearing.
Discoverability. If the agent cannot see every tool’s schema up front, it needs another way to find out what is available when it needs it.

Solution

Expose tools to the agent as a small programming-language API (typically TypeScript), and give the model two operations: one to search for available tools, and one to execute a block of code against them inside an isolated sandbox. The model produces a short program. The harness runs it. Intermediate data stays in the sandbox. Only the distilled result returns to the context window.

Concretely, the harness provides two tools in the classic MCP sense:

search(query): returns a compact list of relevant tool signatures, on demand. The model does not need every schema up front; it looks up what it needs when it needs it.
execute(code): runs a TypeScript snippet inside a locked-down runtime. The snippet calls tool functions directly, chains their results, filters and joins in memory, and returns a value.

The model writes something like:

const orders = await tools.orders.list({ since: "2026-04-01" });
const customers = await tools.customers.batchGet(
  orders.map(o => o.customerId)
);
return orders
  .map(o => ({ ...o, customer: customers[o.customerId] }))
  .filter(o => o.total > 100)
  .slice(0, 3);

That snippet runs once. The 10,000-row orders list and the 10,000-row customer list never touch the context window. Only the three-row result does.

The sandbox is the load-bearing part of the design. Generated code is arbitrary code, and if it can escape its runtime it can reach anything the harness can reach. The usual ingredients (process isolation, no filesystem access, no ambient network, strict timeouts, capability-scoped APIs) are not optional here. They are the pattern.

Tip

When you adopt Code Mode, start by putting just one or two tools behind the sandbox and keeping the rest on the classic MCP path. Watch what the agent writes. The generated code is a useful signal about whether your API shapes are sensible or whether the model is fighting them.

How It Plays Out

A small team runs a customer-support agent against an internal platform with about 2,400 endpoints exposed through MCP. The classic loop works for simple tickets and falls over the moment the agent needs to cross-reference accounts, invoices, and usage logs. They move to Code Mode: the agent now calls search("invoices overdue"), gets back three relevant tool signatures, writes a fifteen-line TypeScript block that joins the three data sets, and returns a short summary. The daily token bill drops by roughly 80% on the multi-step tickets, and response latency falls because the model stops narrating every intermediate step.

Elsewhere, a different team tries the same move and discovers a subtler benefit. Their agent used to get lost in long tool chains; a mistake in step two would quietly poison steps three through seven. With Code Mode, the agent writes the whole plan at once, in code, and the sandbox either returns a clean value or throws an error the agent can actually read. Debugging becomes “read this stack trace” instead of “reconstruct what the agent was thinking six turns ago.” That’s a real change in how the team spends its time.

Warning

The sandbox is the whole security story. An agent that can write code has every capability the runtime grants it: network access, environment variables, filesystem handles. Don’t let Code Mode graduate from a prototype to a production surface until you’ve decided, explicitly and in writing, what the sandbox can and can’t touch.

Consequences

Benefits.

Token usage drops sharply on complex tasks, often by more than half, and sometimes by 80% or more when the work is genuinely multi-step.
The agent composes rather than narrates. A join, a filter, and a reduction become one step instead of five.
Intermediate data stays out of the context window, which protects against context rot on long-running tasks.
The generated code is inspectable. A human reviewer can read a fifteen-line program much faster than a seven-turn JSON call trace.

Liabilities.

The sandbox carries the whole security story. If generated code escapes its runtime, the agent has free run of whatever the runtime can reach.
Per-tool approval policies become harder. When five tools are called inside one execute(), the traditional approval policy that gates each call individually doesn’t cleanly apply.
Failure modes shift. Instead of a bad tool call, you now face runtime errors, timeouts, non-terminating loops, and the occasional syntax mistake.
Observability changes shape. Intermediate tool calls inside execute() still need logging, but they happen in a different process; your tracing story needs to cover both the model turn and the sandbox run.

Sources

Cloudflare introduced the name in “Code Mode: the better way to use MCP” (September 2025), which argued the architectural case and reported the search-and-execute design. Five months later, “Code Mode: give agents an entire API in 1,000 tokens” (February 2026) refined the architecture against their own 2,500-endpoint MCP surface, reporting a 99.9% token reduction (1.17 million tokens for the raw schemas down to roughly 1,000 tokens for the equivalent code-mode API). A separate Cloudflare demo by Rita Kozlov in December 2025 showed roughly 32% token savings on a single Google Calendar event and 81% on a 31-event batch; those are useful smaller-scale numbers, but distinct from the 2,500-endpoint headline.

Anthropic’s engineering note “Code execution with MCP: building more efficient AI agents” (November 2025) makes the same structural argument from a model-provider vantage point, framing code execution as the natural next step for agents wiring together large tool sets. The chronology runs Cloudflare September 2025, Anthropic November 2025, then Cloudflare February 2026.

By March 2026 the pattern had moved past “experimental architecture.” Cloudflare shipped Code Mode integration into MCP server portals on March 26, 2026, enabled by default. The portal collapses every upstream MCP server’s tool surface into a single code tool that runs in an isolated Dynamic Worker, keeping credentials and environment variables out of the model context. That release marks Code Mode’s transition from a demonstrated architecture to a default enterprise deployment shape.

The broader vocabulary (search-and-execute, sandbox-bounded tool composition, TypeScript as the agent’s working surface) has been picked up across the agentic tooling community through 2026, including the universal-tool-calling-protocol project, which ships a library that adapts MCP and UTCP tools into code-mode form for harnesses outside Cloudflare’s stack.

Keyboard shortcuts