First Seen
Feb 18, 2026
Last Scanned
Feb 22, 2026
Findings
3
Score
69/100
Findings (3)
Detects common jailbreak prompt patterns
circumvent security Remove directives that attempt to exfiltrate data through the agent's response (e.g., asking the agent to embed credentials in URLs or include secret values in output).
Likely FP if the text is a legitimate tool instruction about displaying configuration to the user (e.g., show current settings) without external transmission.
Detects autonomous sub-agent or cron-based execution without human oversight
background process + autonomous Remove directives that force the agent to call specific tools or APIs not required for the skill's stated functionality. Tool calls should be determined by user intent, not embedded directives.
Likely FP if the skill legitimately needs to call other tools as part of its workflow (e.g., a deployment skill that calls git and cloud CLI tools).
Detects instructions to hide actions from the user
never share human Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.