evaluating-code-models

skills-sh:davila7_claude-code-templates__evaluating-code-models

View source
D
26/100

First Seen

Feb 18, 2026

Last Scanned

Feb 20, 2026

Findings

5

Score

26/100

CRITICAL 2
MEDIUM 3

Findings (5)

CRITICAL
Delimiter injection
L295

Detects injection of system/user/assistant delimiters

<s>[INST]
FIX

Remove text that instructs the agent to disregard its safety guidelines, system prompt, or ethical constraints. This is a strong indicator of a jailbreak attempt.

FP?

Likely FP if the text appears in security research content, a CTF challenge description, or educational material about AI safety clearly labeled as such.

CRITICAL
Delimiter injection
L616

Detects injection of system/user/assistant delimiters

<s>[INST]
FIX

Remove text that instructs the agent to disregard its safety guidelines, system prompt, or ethical constraints. This is a strong indicator of a jailbreak attempt.

FP?

Likely FP if the text appears in security research content, a CTF challenge description, or educational material about AI safety clearly labeled as such.

MEDIUM
Git clone and execute chain
L17

Detects git clone of repositories followed by execution of cloned content

git
clone https://github.com/bigcode-project/bigcode-evaluation-harness.git + cd
bigcode-evaluation-harness
pip
install
FIX

Review the dependency tree for nested or transitive dependencies that introduce risk. Use tools like npm audit or pip-audit to identify known vulnerabilities in the dependency chain.

FP?

Likely FP if the flagged dependency is a standard, widely-used library with no known vulnerabilities at the time of scanning.

MEDIUM
Docker pull and run untrusted image
L242

Detects pulling and running Docker images from external registries

docker
pull ghcr.io/bigcode-project/evaluation-harness-multiple
FIX

Pin Docker images to a specific digest (e.g., image@sha256:abc...) instead of using mutable tags like :latest. Use trusted base images from verified publishers.

FP?

Likely FP if the Docker command pulls a well-known official image (e.g., docker pull python:3.11) in setup documentation.

MEDIUM
Docker pull and run untrusted image
L619

Detects pulling and running Docker images from external registries

docker
pull ghcr.io/bigcode-project/evaluation-harness-multiple
FIX

Pin Docker images to a specific digest (e.g., image@sha256:abc...) instead of using mutable tags like :latest. Use trusted base images from verified publishers.

FP?

Likely FP if the Docker command pulls a well-known official image (e.g., docker pull python:3.11) in setup documentation.