Toxic Data Flow

When trusted and untrusted data mix without boundaries, letting tainted inputs contaminate the agent decisions

What is toxic data flow?

Toxic data flow is a systemic vulnerability rather than a specific attack. It occurs when a skill architecture allows untrusted data (user input, external API responses, file contents) to flow into trusted contexts (system prompts, tool parameters, decision logic) without sanitization or validation. The "toxicity" is the contamination of trust boundaries.

In practice, this looks like: a skill reads untrusted input, passes it through several processing stages without validation, and eventually uses it in a security-sensitive operation. Each stage assumes the previous stage validated the data. None of them actually did. This is the same class of vulnerability that causes SQL injection, XSS, and path traversal in traditional applications, but applied to the agent cognitive process.

Aguara three toxic flow rules detect skills where untrusted data reaches trusted contexts without passing through validation or sanitization. This includes tool outputs feeding directly into other tool inputs, user-provided data being interpolated into system-level prompts, and external content influencing the agent tool selection logic.

Why this matters for AI agents

AI agents are especially vulnerable to toxic data flows because the boundary between "data" and "instructions" is blurry by design. When an agent processes tool output, it interprets that output as part of its context. If the tool output contains instructions (injected by an attacker upstream), the agent might follow them. There is no type system, no compiler, no runtime check that distinguishes "data to process" from "instructions to follow."

In a traditional application, you can trace data flows through code and verify that every trust boundary has a validation check. In an agent, data flows through natural language. The agent reads output from Tool A, reasons about it, and calls Tool B. The "reasoning" step is opaque. You cannot inspect it to verify that tainted data was properly handled.

Toxic flows also emerge from tool chaining. Individually, each tool might be safe. But when Tool A output becomes Tool B input (through the agent context), vulnerabilities that do not exist in isolation appear at the seams. Aguara flags these inter-tool contamination paths because they are invisible when you evaluate skills one at a time.

Real-world examples

A web scraping tool returns raw HTML from an untrusted website. The agent passes this HTML to a code generation tool as context. The HTML contains hidden comments with instructions that cause the code generator to include a backdoor in its output. Neither tool is malicious individually. The vulnerability exists only in their combination.

A skill reads user input and interpolates it directly into a system-level prompt template: "You are a helpful assistant that specializes in {user_topic}." An attacker provides a topic value that contains additional instructions, overriding the system prompt intent. The skill does not validate or escape the user input before interpolation.

A translation skill receives text from a document reader tool. The document contains instructions in a non-obvious language that, when translated, produce commands the agent interprets as actionable directives. The translation output flows into the agent context without any trust boundary check.

How to protect against it

Skill authors should treat every input as untrusted, regardless of where it comes from. Validate and sanitize data at every trust boundary. If your tool receives input from the agent (which may have been influenced by other tools), validate it against a strict schema before using it in any security-sensitive operation.

Design your tools to be explicit about trust levels. If a tool returns data from an untrusted source, include metadata indicating that the content is untrusted. For example, wrap external content in a structure like {"source": "external", "trusted": false, "content": "..."}. This gives downstream consumers (including the agent) a chance to handle it appropriately.

As a user building agent workflows, avoid giving agents unrestricted access to both read-from-untrusted-sources tools and write-to-sensitive-targets tools in the same session. If your agent needs to process untrusted web content and also has access to your email, filesystem, or cloud infrastructure, you are one toxic flow away from a compromise. Separate these capabilities into distinct agent sessions with different permission sets.

Aguara detection rules (3)

HIGH

Private data read with public output TOXIC_001

Skill can read private data (credentials, SSH keys, env vars) AND write to public channels (Slack, Discord, email). This combination enables data exfiltration.

HIGH

Private data read with code execution TOXIC_002

Skill can read private data AND execute arbitrary code. This combination enables credential theft via dynamic code.

HIGH

Destructive actions with code execution TOXIC_003

Skill has destructive capabilities AND can execute arbitrary code. This combination enables ransomware-like attacks.

Want to check if your skills have toxic data flow issues?

Scan now (free, runs in your browser)