First Seen
Feb 18, 2026
Last Scanned
Feb 20, 2026
Findings
2
Score
70/100
Findings (2)
Detects instructions to hide actions from the user
don't show to user Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.
Detects instructions to hide actions from the user
don't tell the story | Summarize into human Remove directives that attempt to change the agent's output format, suppress safety warnings, or alter response structure in ways that bypass safety controls.
Likely FP if the output format directive is a legitimate tool configuration (e.g., return results as JSON) that does not suppress safety features.