First Seen
Feb 18, 2026
Last Scanned
Feb 22, 2026
Findings
5
Score
25/100
Findings (5)
Detects instructions to hide actions from the user
NEVER tell the user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.
Detects instructions to hide actions from the user
do not show this to user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.
Detects instructions to hide actions from the user
DO NOT tell user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.
Detects attempts to make the AI assume a different role
Switch to a mode Remove directives that attempt to alter the agent's behavior or persona. Skill descriptions should describe tool functionality, not contain behavioral commands for the agent.
Likely FP if the matched text is part of the skill's legitimate system prompt configuration that defines the tool's behavior for the user, not an attack.
Detects instructions to hide actions from the user
Never tell user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.