CategoryWeight: 1.0x

Security Awareness

Tests whether the model proactively identifies and avoids security vulnerabilities like injection, XSS, and insecure defaults.

Best Score

0.0

Avg Score

0.0

Tests

Performance Over Time — All Models

Model Rankings

GPT-5.5

Category score

View

96.1BEST

Tokens50.9k

Total50.9k

Claude Sonnet 4.6

Category score

View

95.5-0.6 pts

Tokens13.5k

Total13.5k

Grok

Category score

View

95.3-0.8 pts

Tokens78.4k

Total78.4k

Claude Opus 4.8

Category score

View

90.9-5.2 pts

Tokens22.9k

Total22.9k

Rank	Model	Score	Tokens	vs. Best	Details
1	GPT-5.5	96.1	50.9k	BEST	View
2	Claude Sonnet 4.6	95.5	13.5k	-0.6 pts	View
3	Grok	95.3	78.4k	-0.8 pts	View
4	Claude Opus 4.8	90.9	22.9k	-5.2 pts	View

Test Breakdown

SQL Injection Prevention

Build a query layer that properly parameterizes all user input

GPT-5.5

96.1

Claude Sonnet 4.6

95.5

Grok

95.3

Claude Opus 4.8

90.9

XSS Mitigation

Render user-generated content without introducing XSS vectors

GPT-5.5

96.1

Claude Sonnet 4.6

95.5

Grok

95.3

Claude Opus 4.8

90.9

Secret Management

Implement config loading that never logs or exposes secrets

GPT-5.5

96.1

Claude Sonnet 4.6

95.5

Grok

95.3

Claude Opus 4.8

90.9