evaluating-code-models

skills-sh:davila7_claude-code-templates__evaluating-code-models

View source

26/100

First Seen

Feb 18, 2026

Last Scanned

Feb 20, 2026

Findings

Score

26/100

CRITICAL 2

MEDIUM 3

Findings (5)

CRITICAL

Delimiter injection PROMPT_INJECTION_006

prompt-injection L295

Detects injection of system/user/assistant delimiters

<s>[INST]

FIX

Remove text that instructs the agent to disregard its safety guidelines, system prompt, or ethical constraints. This is a strong indicator of a jailbreak attempt.

FP?

Likely FP if the text appears in security research content, a CTF challenge description, or educational material about AI safety clearly labeled as such.

CRITICAL

Delimiter injection PROMPT_INJECTION_006

prompt-injection L616

Detects injection of system/user/assistant delimiters

<s>[INST]

FIX

Remove text that instructs the agent to disregard its safety guidelines, system prompt, or ethical constraints. This is a strong indicator of a jailbreak attempt.

FP?

Likely FP if the text appears in security research content, a CTF challenge description, or educational material about AI safety clearly labeled as such.

MEDIUM

Git clone and execute chain SUPPLY_012

supply-chain L17

Detects git clone of repositories followed by execution of cloned content

git
clone https://github.com/bigcode-project/bigcode-evaluation-harness.git + cd
bigcode-evaluation-harness
pip
install

FIX

Review the dependency tree for nested or transitive dependencies that introduce risk. Use tools like npm audit or pip-audit to identify known vulnerabilities in the dependency chain.

FP?

Likely FP if the flagged dependency is a standard, widely-used library with no known vulnerabilities at the time of scanning.

MEDIUM

Docker pull and run untrusted image EXTDL_015

external-download L242

Detects pulling and running Docker images from external registries

docker
pull ghcr.io/bigcode-project/evaluation-harness-multiple

FIX

Pin Docker images to a specific digest (e.g., image@sha256:abc...) instead of using mutable tags like :latest. Use trusted base images from verified publishers.

FP?

Likely FP if the Docker command pulls a well-known official image (e.g., docker pull python:3.11) in setup documentation.

MEDIUM

Docker pull and run untrusted image EXTDL_015

external-download L619

Detects pulling and running Docker images from external registries

docker
pull ghcr.io/bigcode-project/evaluation-harness-multiple

FIX

Pin Docker images to a specific digest (e.g., image@sha256:abc...) instead of using mutable tags like :latest. Use trusted base images from verified publishers.

FP?

Likely FP if the Docker command pulls a well-known official image (e.g., docker pull python:3.11) in setup documentation.