Output Encoding
Also known as: Output Escaping, Context-Sensitive Encoding
Context
This is a tactical pattern that complements input validation. While input validation checks data when it arrives, output encoding makes sure data is rendered safely when it leaves: when it gets inserted into HTML, SQL, shell commands, URLs, or any other context where special characters have meaning.
In agentic coding workflows, output encoding matters whenever an AI agent generates content that will be interpreted by another system. If an agent produces HTML, constructs a shell command, or builds a database query, the output must be encoded correctly for its destination context.
Problem
Data that’s perfectly safe in one context can be dangerous in another. A user’s display name containing <script>alert('xss')</script> is harmless in a log file but executes as code when rendered in a web page. A filename containing a semicolon is fine on most file systems but triggers command injection when passed to a shell. The same bytes mean different things in different contexts. How do you make sure data is always treated as data, never as commands or structure, regardless of where it ends up?
Forces
- Each output context (HTML, SQL, shell, URL, JSON, CSV) has its own special characters and encoding rules.
- Developers must remember to encode at every output point. Forgetting even once creates a vulnerability.
- Double-encoding (encoding something that’s already encoded) produces garbled output.
- Some frameworks handle encoding automatically; others leave it entirely to the developer.
Solution
Apply context-appropriate encoding at the point where data is inserted into output. The principle: encode for the destination, not the source.
- HTML context: Encode
<,>,&,", and'as HTML entities. Most template engines do this automatically. Make sure auto-escaping is enabled and never bypass it without a clear reason. - SQL context: Use parameterized queries or prepared statements. Never concatenate user data into SQL strings. The database driver handles the encoding.
- Shell context: Avoid passing user data to shell commands entirely. If you can’t avoid it, use the language’s built-in shell escaping functions or pass data as arguments to an exec-style call that bypasses the shell interpreter.
- URL context: Percent-encode special characters when inserting data into URLs.
- JSON context: Use a proper JSON serializer rather than string concatenation.
The common thread: never construct structured output (HTML, SQL, commands, URLs) by concatenating raw strings. Use the tools your language and framework provide for safe construction.
How It Plays Out
A web application displays user comments on a page. One user submits a comment containing <img src=x onerror=alert(document.cookie)>. If the application inserts this comment into the HTML without encoding, every visitor’s browser executes the script, potentially leaking session cookies. With proper HTML encoding, the comment displays as literal text, visible but harmless.
An AI agent generates a shell command to rename a file based on user input. The user provides the filename my file; rm -rf /. If the agent constructs the command with string concatenation (mv "old" "my file; rm -rf /"), the result depends on how the shell interprets the string. Using a safe API like Python’s subprocess.run(["mv", "old", user_filename]) avoids shell interpretation entirely. The filename is treated as a single argument, no matter what characters it contains.
When reviewing AI-generated code, check how it constructs HTML, SQL, shell commands, and URLs. Agents frequently use string concatenation because it’s simpler. Ask the agent to use parameterized queries, template engines with auto-escaping, or subprocess calls that bypass the shell.
“Review the code that constructs shell commands from user input. Replace any string concatenation with subprocess calls that pass arguments as a list, so filenames with special characters are treated as data, not as shell syntax.”
Consequences
Proper output encoding eliminates entire classes of vulnerabilities: cross-site scripting (XSS), SQL injection, command injection, and header injection. It works as a defense even when input validation is imperfect. If the data is encoded correctly at the point of output, it can’t be interpreted as commands.
The costs are modest but real: developers must know which encoding to apply in which context, and must apply it consistently. Framework defaults help a lot. Using a template engine with auto-escaping enabled is far safer than constructing HTML strings by hand. The most common failure isn’t the difficulty of encoding but the forgetting of it.
Related Patterns
- Complements: Input Validation. Validation filters input; encoding protects output. Both are needed.
- Depends on: Trust Boundary. Encoding is applied when data crosses into a new context.
- Enables: Vulnerability. Proper encoding prevents XSS, SQL injection, and command injection.
- Uses: Attack Surface. Every output point on the surface needs appropriate encoding.