Input Validation

Pattern

A reusable solution you can apply to your work.

Context

This is a tactical pattern. Every point on your attack surface where data enters the system is a potential entry point for an attack. Input validation is the practice of checking whether that data is acceptable before doing anything with it. It’s one of the most basic defenses in software security, and one of the most effective.

In agentic workflows, input validation applies to every piece of data an AI agent processes: user messages, file contents, API responses, and web page text. An agent that acts on unvalidated input is open to prompt injection and other manipulation.

Problem

Systems receive data from many sources: users, APIs, files, databases, other services, AI agents. Not all of this data is well-formed, and some of it is deliberately malicious. SQL injection, cross-site scripting, buffer overflows, command injection, and path traversal attacks all exploit the same root cause: the system accepted and acted on input it should have rejected. How do you prevent bad data from causing harm?

Forces

Strict validation prevents attacks but may reject legitimate edge-case input.
Permissive validation is user-friendly but creates exploitable gaps.
Validation rules differ by context. A string that’s safe in HTML may be dangerous in SQL.
Validating everything is tedious, and developers skip it under time pressure.
Input arrives in many forms: strings, numbers, JSON, XML, binary, files. Each requires different checks.

Solution

Validate all input at every trust boundary before acting on it. Follow these principles:

Validate on the server side. Client-side validation is for user experience; server-side validation is for security. Never trust the client to enforce constraints.

Use allowlists over denylists. Define what is acceptable (a string of 1-100 alphanumeric characters) rather than trying to enumerate everything that’s dangerous (no angle brackets, no semicolons, no quotes…). Allowlists are smaller, simpler, and harder to bypass.

Validate for the context. A username has different valid characters than a search query, which has different valid characters than a file path. Validate each input according to how it will be used.

Validate type, length, range, and format. Is it the expected data type? Is it within acceptable length bounds? Does it fall within a valid range? Does it match the expected format (e.g., email, date, UUID)?

Reject and log invalid input. Don’t try to “clean” malicious input and use it anyway. Reject it, return a clear error, and log the attempt for monitoring.

Validate deeply. If you accept JSON, validate not just that it’s valid JSON but that the structure, field names, types, and values match your expectations. A well-formed JSON payload can still contain a SQL injection in a string field.

How It Plays Out

A web application accepts a search query parameter. Without validation, an attacker submits '; DROP TABLE users; -- and the query is concatenated into a SQL statement, deleting the users table. With proper validation (or better, parameterized queries) the input is either rejected or treated as a literal string, harmless.

An AI agent is asked to process a CSV file uploaded by a user. The CSV contains a cell with the value =SYSTEM("rm -rf /"). If the agent passes this to a spreadsheet tool without validation, the formula could execute. Input validation here means checking that cell values match expected data types (numbers, dates, plain text) and rejecting or escaping formula-like content.

Tip

When directing an AI agent to handle user-provided input, explicitly instruct it to validate the data before processing. Agents often skip validation unless prompted, because their training data includes plenty of code that skips it too.

Example Prompt

“Add input validation to every endpoint that accepts user data. Check types, enforce length limits, and reject any value that doesn’t match the expected format. Use parameterized queries for all database operations.”

Consequences

Input validation is the single most effective defense against the most common classes of attacks. It stops exploitation at the point of entry, before malicious data can reach vulnerable internal components. It also improves reliability. Many bugs and crashes come from unexpected input that validation would have caught.

The costs are development effort (every endpoint and input path needs validation logic), potential user friction (legitimate but unusual input may be rejected), and maintenance (validation rules must evolve as the system changes). There’s also a false sense of security to guard against: validation alone is necessary but not sufficient. It must be combined with output encoding, parameterized queries, and other defenses in depth.

Depends on: Trust Boundary. Validation is applied at trust boundaries.
Depends on: Attack Surface. Every entry point on the surface needs validation.
Enables: Vulnerability. Proper validation prevents many classes of vulnerability.
Complements: Output Encoding. Validation checks input; encoding protects output.
Enables: Prompt Injection. Input validation is part of defending against injection attacks.
Violated by: Tool Poisoning – the core failure is treating tool descriptions as validated input.

Keyboard shortcuts

Encyclopedia of Agentic Coding Patterns