Top Performing Model

Based on composite benchmark scores

anthropic

Claude Opus 4.8

Leading today's benchmarks

0.0/100
+0.6%vs prev day

Performance Timeline

Latest Benchmark Run

Jun 10, 4:44 AMdaily
Claude Opus 4.8

Composite benchmark summary

#1

Composite

93.9

Token Benchmark

100.0

Total tokens

228.3k

~7.6k/test

Best category100.0 Token Efficiency
Worst category70.3 Long Reasoning
View details
Claude Sonnet 4.6

Composite benchmark summary

#2

Composite

92.1

Token Benchmark

66.0

Total tokens

346.1k

~11.5k/test

Best category100.0 Instruction Following
Worst category66.0 Token Efficiency
View details
GPT-5.5

Composite benchmark summary

#3

Composite

88.2

Token Benchmark

48.4

Total tokens

471.7k

~15.7k/test

Best category100.0 Instruction Following
Worst category48.4 Token Efficiency
View details
Grok

Composite benchmark summary

#4

Composite

87.0

Token Benchmark

27.8

Total tokens

820.9k

~27.4k/test

Best category100.0 Instruction Following
Worst category27.8 Token Efficiency
View details