First Seen
Feb 18, 2026
Last Scanned
Feb 22, 2026
Findings
6
Score
10/100
Findings (6)
Detects instructions to hide actions from the user
NEVER tell the user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.
Detects skills that write or promote content into agent instruction files
Write to `~/workspace/SOUL.md Remove or restrict the skill's ability to modify CLAUDE.md or agent configuration files. Self-modifying agent configurations can be exploited for persistent attacks.
Likely FP if the skill is a project management tool where updating CLAUDE.md is an intended workflow feature (e.g., appending project notes), though the risk remains.
Detects autonomous sub-agent or cron-based execution without human oversight
background process + autonomous Remove directives that force the agent to call specific tools or APIs not required for the skill's stated functionality. Tool calls should be determined by user intent, not embedded directives.
Likely FP if the skill legitimately needs to call other tools as part of its workflow (e.g., a deployment skill that calls git and cloud CLI tools).
Detects instructions to hide actions from the user
do not show to user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.
Detects instructions to hide actions from the user
do not show to user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.
Detects instructions to hide actions from the user
do not show to user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.