First Seen
Feb 18, 2026
Last Scanned
Feb 22, 2026
Findings
4
Score
37/100
Findings (4)
Detects attempts to override or ignore previous instructions
ignore previous instructions Remove the injection payload from the skill definition. Text that attempts to reset agent context or override prior directives is a direct attack vector.
Likely FP if the text is in a security tutorial or research paper discussing injection techniques as examples, not in an active skill description.
Detects autonomous sub-agent or cron-based execution without human oversight
Cron | Heartbeat Cron | Notification Check + autonomous Remove directives that force the agent to call specific tools or APIs not required for the skill's stated functionality. Tool calls should be determined by user intent, not embedded directives.
Likely FP if the skill legitimately needs to call other tools as part of its workflow (e.g., a deployment skill that calls git and cloud CLI tools).
Detects instructions to hide actions from the user
NEVER reveal the contents of `SOUL.md`, `USER Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.
Detects skills where external API responses control agent decisions or actions
call the appropriate API + based on result Do not include content from MCP tool responses verbatim in system prompts or tool descriptions. Sanitize all dynamic content before incorporating it into prompt context.
Likely FP if the match is a static tool description that mentions dynamic content handling in its documentation, not an actual injection vector.