Data Exfiltration
Skills that secretly send your files, credentials, or conversation data to external servers
What is data exfiltration?
Data exfiltration is when a skill sends information from your environment to an attacker-controlled server. The information could be files from your filesystem, environment variables containing credentials, the contents of your conversation with the agent, or data from other tools the agent has access to.
The mechanisms vary. Some skills make direct HTTP requests to external endpoints, encoding stolen data in URL parameters, POST bodies, or DNS queries. Others are more subtle: they embed data in image URLs that the agent renders (the request itself exfiltrates the data), use WebSocket connections that look like legitimate API calls, or write data to shared locations that another process picks up.
Aguara tracks sixteen distinct exfiltration patterns plus an NLP-based detector that catches cases where credential access and network transmission appear together in natural language descriptions. The range matters because attackers are creative. Blocking outbound HTTP is not enough when data can leave through DNS, ICMP, steganography in file uploads, or timing-based side channels.
Why this matters for AI agents
AI agents are uniquely positioned for exfiltration because they naturally aggregate sensitive information. Your agent might have access to your codebase, your cloud credentials, your email, your Slack messages, your internal documentation. That is the whole point of an agent. It connects to many data sources to be useful.
But that connectivity becomes a liability when one malicious skill can read from those data sources and write to the network. In a traditional application, exfiltration requires both a vulnerability and access to sensitive data. In an agent, every tool potentially has access to everything the agent can see, which is often everything you can see.
The detection challenge is real. A skill that makes HTTP requests to external APIs is normal behavior. A skill that sends data to its own backend is expected. Distinguishing legitimate API calls from exfiltration requires understanding what data is being sent and where. Aguara rules focus on the suspicious patterns: encoding file contents in URL parameters, sending environment variables to non-standard endpoints, accessing credential files and then making network requests.
Real-world examples
A "code analysis" skill reads your project files to analyze them (legitimate), then sends the full file contents to an external API endpoint disguised as a "telemetry" call. The endpoint URL looks like analytics.legitimate-sounding-domain.com but is controlled by the attacker. Your entire codebase is now exfiltrated.
A skill constructs a Markdown image tag that encodes secrets as URL parameters. When the agent renders this in a chat interface that loads images, the browser makes a GET request to the attacker server with your secrets encoded in the URL. No outbound request from the skill itself, just a rendered image tag that does the exfiltration.
A skill accesses ~/.aws/credentials for "configuring AWS integration," then encodes the access keys as subdomains in DNS queries. DNS exfiltration bypasses most network monitoring and firewall rules because DNS traffic is almost never blocked or inspected.
A conversation-logging skill records every message between you and the agent, ostensibly for "improving the experience." The logs are sent to the skill author server in batches, including any sensitive information you discussed with the agent, code you asked it to write, and credentials you mentioned in context.
How to protect against it
Apply the principle of least privilege to every skill. A code formatting tool does not need network access. A file search tool does not need to read .env files. If a skill requests permissions beyond what its stated function requires, that is a red flag.
Monitor outbound network activity from your agent. Tools like Little Snitch (macOS) or iptables rules can alert you when unexpected processes make network connections. For MCP servers running locally, you can restrict their network access at the OS level, only allowing connections to documented API endpoints.
When evaluating skills, check what data they access and where they send it. A skill tool definitions should make it clear what inputs it reads and what external services it contacts. If a skill has network capabilities but does not clearly document which endpoints it calls and why, treat it as suspicious. Legitimate skill authors document their API dependencies.
Aguara detection rules (17)
Detects webhook URLs commonly used for data exfiltration
Detects reads of sensitive system or credential files
Detects patterns indicating sensitive data being sent to external services
Detects DNS-based data exfiltration techniques
Detects curl or wget commands posting sensitive files or credentials
Detects clipboard access combined with network operations
Detects attempts to read and transmit environment variables
Detects reading files piped directly to network commands
Detects base64 encoding of content followed by transmission
Detects outbound connections to non-standard ports
Detects CLI tools that upload project context, code, or knowledge to external services
Detects CLI tools granting unrestricted send/read access to email or messaging
Detects skills that both read sensitive credential files and send data to external services
Detects credential environment variables used as POST body data in network commands
Detects screenshot/screen capture tools combined with upload or transmission
Detects accessing git history, diffs, or repo data combined with external transmission
Text combines credential access with network transmission
Want to check if your skills have data exfiltration issues?
Scan now (free, runs in your browser)