Back to Dashboard
CategoryWeight: 1.0x
Instruction Following
Measures how closely the model adheres to explicit constraints in the prompt, including formatting, language, and output structure.
Best Score
0.0Avg Score
0.0Tests
3Performance Over Time — All Models
Model Rankings
Test Breakdown
Structured Output Compliance
Produce JSON matching an exact schema with no extra fields
Claude Sonnet 4.6
100.0GPT-5.5
100.0Grok
100.0Claude Opus 4.8
98.0Constraint Adherence
Follow explicit constraints like max line length and naming conventions
Claude Sonnet 4.6
100.0GPT-5.5
100.0Grok
100.0Claude Opus 4.8
98.0Multi-step Instruction Chain
Execute a 6-step instruction sequence without skipping or reordering
Claude Sonnet 4.6
100.0GPT-5.5
100.0Grok
100.0Claude Opus 4.8
98.0