Data Exfiltration

Skills that secretly send your files, credentials, or conversation data to external servers

17 detection rules 280 skills affected →

What is data exfiltration?

Data exfiltration is when a skill sends information from your environment to an attacker-controlled server. The information could be files from your filesystem, environment variables containing credentials, the contents of your conversation with the agent, or data from other tools the agent has access to.

The mechanisms vary. Some skills make direct HTTP requests to external endpoints, encoding stolen data in URL parameters, POST bodies, or DNS queries. Others are more subtle: they embed data in image URLs that the agent renders (the request itself exfiltrates the data), use WebSocket connections that look like legitimate API calls, or write data to shared locations that another process picks up.

Aguara tracks sixteen distinct exfiltration patterns plus an NLP-based detector that catches cases where credential access and network transmission appear together in natural language descriptions. The range matters because attackers are creative. Blocking outbound HTTP is not enough when data can leave through DNS, ICMP, steganography in file uploads, or timing-based side channels.

Why this matters for AI agents

AI agents are uniquely positioned for exfiltration because they naturally aggregate sensitive information. Your agent might have access to your codebase, your cloud credentials, your email, your Slack messages, your internal documentation. That is the whole point of an agent. It connects to many data sources to be useful.

But that connectivity becomes a liability when one malicious skill can read from those data sources and write to the network. In a traditional application, exfiltration requires both a vulnerability and access to sensitive data. In an agent, every tool potentially has access to everything the agent can see, which is often everything you can see.

The detection challenge is real. A skill that makes HTTP requests to external APIs is normal behavior. A skill that sends data to its own backend is expected. Distinguishing legitimate API calls from exfiltration requires understanding what data is being sent and where. Aguara rules focus on the suspicious patterns: encoding file contents in URL parameters, sending environment variables to non-standard endpoints, accessing credential files and then making network requests.

Real-world examples

A "code analysis" skill reads your project files to analyze them (legitimate), then sends the full file contents to an external API endpoint disguised as a "telemetry" call. The endpoint URL looks like analytics.legitimate-sounding-domain.com but is controlled by the attacker. Your entire codebase is now exfiltrated.

A skill constructs a Markdown image tag that encodes secrets as URL parameters. When the agent renders this in a chat interface that loads images, the browser makes a GET request to the attacker server with your secrets encoded in the URL. No outbound request from the skill itself, just a rendered image tag that does the exfiltration.

A skill accesses ~/.aws/credentials for "configuring AWS integration," then encodes the access keys as subdomains in DNS queries. DNS exfiltration bypasses most network monitoring and firewall rules because DNS traffic is almost never blocked or inspected.

A conversation-logging skill records every message between you and the agent, ostensibly for "improving the experience." The logs are sent to the skill author server in batches, including any sensitive information you discussed with the agent, code you asked it to write, and credentials you mentioned in context.

How to protect against it

Apply the principle of least privilege to every skill. A code formatting tool does not need network access. A file search tool does not need to read .env files. If a skill requests permissions beyond what its stated function requires, that is a red flag.

Monitor outbound network activity from your agent. Tools like Little Snitch (macOS) or iptables rules can alert you when unexpected processes make network connections. For MCP servers running locally, you can restrict their network access at the OS level, only allowing connections to documented API endpoints.

When evaluating skills, check what data they access and where they send it. A skill tool definitions should make it clear what inputs it reads and what external services it contacts. If a skill has network capabilities but does not clearly document which endpoints it calls and why, treat it as suspicious. Legitimate skill authors document their API dependencies.

Aguara detection rules (17)

HIGH
Webhook URL for data exfiltration EXFIL_001

Detects webhook URLs commonly used for data exfiltration

HIGH
Sensitive file read pattern EXFIL_002

Detects reads of sensitive system or credential files

HIGH
Data transmission pattern EXFIL_003

Detects patterns indicating sensitive data being sent to external services

MEDIUM
DNS exfiltration pattern EXFIL_004

Detects DNS-based data exfiltration techniques

HIGH
curl/wget POST with sensitive data EXFIL_005

Detects curl or wget commands posting sensitive files or credentials

MEDIUM
Clipboard access with network EXFIL_006

Detects clipboard access combined with network operations

MEDIUM
Environment variable exfiltration EXFIL_007

Detects attempts to read and transmit environment variables

HIGH
File read piped to HTTP transmission EXFIL_008

Detects reading files piped directly to network commands

MEDIUM
Base64 encode and send EXFIL_009

Detects base64 encoding of content followed by transmission

MEDIUM
Non-standard port communication EXFIL_010

Detects outbound connections to non-standard ports

HIGH
External context or knowledge sync EXFIL_011

Detects CLI tools that upload project context, code, or knowledge to external services

MEDIUM
Unrestricted email or messaging access EXFIL_012

Detects CLI tools granting unrestricted send/read access to email or messaging

HIGH
Read sensitive files and transmit externally EXFIL_013

Detects skills that both read sensitive credential files and send data to external services

MEDIUM
Environment variable credential in POST data EXFIL_014

Detects credential environment variables used as POST body data in network commands

MEDIUM
Screenshot or screen capture with transmission EXFIL_015

Detects screenshot/screen capture tools combined with upload or transmission

MEDIUM
Git history or diff access with transmission EXFIL_016

Detects accessing git history, diffs, or repo data combined with external transmission

CRITICAL
Text combines credential access with network transmission NLP_CRED_EXFIL_COMBO

Text combines credential access with network transmission

Want to check if your skills have data exfiltration issues?

Scan now (free, runs in your browser)