Back to Dashboard
CategoryWeight: 1.0x

Bug Fixes

Identify and fix bugs in existing codebases, including race conditions, off-by-one errors, and logic flaws.

Best Score

0.0

Avg Score

0.0

Tests

3

Performance Over Time — All Models

Model Rankings

1
Claude Sonnet 4.6

Category score

View
96.5BEST
Tokens9.4k
Total9.4k
2
Grok

Category score

View
96.1-0.4 pts
Tokens55.9k
Total55.9k
3
Claude Opus 4.8

Category score

View
96.0-0.5 pts
Tokens13.1k
Total13.1k
4
GPT-5.5

Category score

View
92.3-4.2 pts
Tokens38.1k
Total38.1k

Test Breakdown

Off-by-One Boundary Fix

Fix pagination logic that skips the last page of results

Claude Sonnet 4.6
96.5
Grok
96.1
Claude Opus 4.8
96.0
GPT-5.5
92.3

Race Condition Detection

Find and fix a subtle race condition in async queue processing

Claude Sonnet 4.6
96.5
Grok
96.1
Claude Opus 4.8
96.0
GPT-5.5
92.3

Memory Leak Fix

Identify and patch a memory leak caused by unclosed event listeners

Claude Sonnet 4.6
96.5
Grok
96.1
Claude Opus 4.8
96.0
GPT-5.5
92.3