Back to Dashboard
CategoryWeight: 1.0x
Code Thoroughness
Evaluates completeness of generated code: edge case handling, input validation, error paths, and test coverage.
Best Score
0.0Avg Score
0.0Tests
3Performance Over Time — All Models
Model Rankings
Test Breakdown
Edge Case Coverage
Generate code handling null, empty, unicode, and overflow inputs
Claude Sonnet 4.6
95.8Grok
95.5GPT-5.5
92.3Claude Opus 4.8
91.4Error Path Completeness
Ensure all failure modes have proper error handling and logging
Claude Sonnet 4.6
95.8Grok
95.5GPT-5.5
92.3Claude Opus 4.8
91.4Test Suite Completeness
Generate tests covering happy path, edge cases, and integration
Claude Sonnet 4.6
95.8Grok
95.5GPT-5.5
92.3Claude Opus 4.8
91.4