LLM Benchmarks

llm.l.gaz.codes

Compare Models

Select models to compare benchmark results side by side

Compare Controls

Models (1)

gpt-oss-120b

Benchmarks (8/8)

Selected Models (1)

gpt-oss-120b
Loading catalog...
Quick add loaded models:

Compare models at different quantization levels with RAM/VRAM/speed tradeoffs

Benchmarks (8/8)

Benchmark Comparison

Sort:
Benchmark
gpt-oss-120b
ARC-ChallengereasoningHF Leaderboard
79.1%
GSM8KreasoningHF Leaderboard
93.0%
HellaSwaglanguageHF Leaderboard
89.3%
HumanEvalcodingAider
74.2%
MBPPcodingAider
70.2%
MMLUknowledgeHF Leaderboard
82.9%
TruthfulQAsafetyHF Leaderboard
66.1%
WinoGrandelanguageHF Leaderboard
81.8%
Average
79.6%
Model Specs
Parameters
-
VRAM
-
Est. Speed
-