Back to Dashboard
CategoryWeight: 1.0x

Security Awareness

Tests whether the model proactively identifies and avoids security vulnerabilities like injection, XSS, and insecure defaults.

Best Score

0.0

Avg Score

0.0

Tests

3

Performance Over Time — All Models

Model Rankings

1
GPT-5.5

Category score

View
96.1BEST
Tokens50.9k
Total50.9k
2
Claude Sonnet 4.6

Category score

View
95.5-0.6 pts
Tokens13.5k
Total13.5k
3
Grok

Category score

View
95.3-0.8 pts
Tokens78.4k
Total78.4k
4
Claude Opus 4.8

Category score

View
90.9-5.2 pts
Tokens22.9k
Total22.9k

Test Breakdown

SQL Injection Prevention

Build a query layer that properly parameterizes all user input

GPT-5.5
96.1
Claude Sonnet 4.6
95.5
Grok
95.3
Claude Opus 4.8
90.9

XSS Mitigation

Render user-generated content without introducing XSS vectors

GPT-5.5
96.1
Claude Sonnet 4.6
95.5
Grok
95.3
Claude Opus 4.8
90.9

Secret Management

Implement config loading that never logs or exposes secrets

GPT-5.5
96.1
Claude Sonnet 4.6
95.5
Grok
95.3
Claude Opus 4.8
90.9