LLM Benchmarks

llm.l.gaz.codes

← Back to Models

gpt-oss-120b

openai/gpt-oss-120b

Publisher

openai

Architecture

gpt-oss

Quantization

MXFP4

Format

gguf

Max Context

131k tokens

Type

llm

Machine

gaz-studio

Capabilities

tool_use

Benchmark Results

Compare with other models →

knowledge

Benchmark	Score
MMLUlocal	78.5%

reasoning

Benchmark	Score
GSM8Klocal	85.2%
ARC-Challengelocal	75.4%

coding

Benchmark	Score
HumanEvallocal	72.0%

language

Benchmark	Score
HellaSwaglocal	82.3%

safety

Benchmark	Score
TruthfulQAlocal	65.8%

LLM Benchmarks