ModelBench

Compare LLM performance and value across providers

Best Value

Gemini 1.5 Flash

Value Score: 3367 — Best performance per dollar spent. At $0.075/1M input tokens, you get 78.9% HumanEval.

Best Performance

o1

92.4% HumanEval, 91.8% MMLU — Top scores across benchmarks, ideal for complex reasoning tasks.

PRAQTOR Edge

Real-World Weighting

We factor in latency, context limits, and reliability — not just benchmark scores.

Cross-Provider Normalization

Compare apples to apples across different pricing structures.

Task-Specific Rankings

Coding vs summarization vs chat — pick the right model for your task.

Provider
Gemini 1.5 FlashBest Value
👁️ Vision⚡ Functions
Google78.9%78.9%86.2%$0.075 / $0.33367
Llama 3.1 8B
⚡ Functions
Meta (Together)72.6%73%84.5%$0.18 / $0.183307
Gemini 2.0 Flash
👁️ Vision⚡ Functions
Google89.7%83.2%93.1%$0.1 / $0.42749
GPT-4o Mini
👁️ Vision⚡ Functions
OpenAI87%82%93.2%$0.15 / $0.61806
Mixtral 8x7B
⚡ Functions
Mistral (Together)74%70.6%74.4%$0.6 / $0.6947
Llama 3.1 70B
⚡ Functions
Meta (Together)80.5%86%94.1%$0.88 / $0.88765
Claude 3.5 Haiku
👁️ Vision⚡ Functions
Anthropic88.1%75.2%91.6%$0.8 / $4274
Gemini 1.5 Pro
👁️ Vision⚡ Functions
Google84.1%85.9%91.7%$1.25 / $5217
Llama 3.1 405B
⚡ Functions
Meta (Together)89%88.6%96.8%$3.5 / $3.5203
Mistral Large
⚡ Functions
Mistral84%84%91.2%$2 / $6167
o1 Mini
OpenAI92.4%85.2%94.8%$3 / $12121
Claude 3.5 Sonnet
👁️ Vision⚡ Functions
Anthropic92%88.3%96.4%$3 / $1579
GPT-4o
👁️ Vision⚡ Functions
OpenAI90.2%88.7%95.3%$5 / $1571
o1Top Perf
👁️ Vision
OpenAI92.4%91.8%96.4%$15 / $6025
Claude 3 Opus
👁️ Vision⚡ Functions
Anthropic84.9%86.8%95%$15 / $7515

PRAQTOR Insight

Gemini 1.5 Flash offers the best value — 78.9% HumanEval at just $0.075/1M input tokens. For maximum performance, o1 leads with 92.4% HumanEval.💡 Tip: Use tiered routing — route 80% of simple queries to Gemini 1.5 Flash and complex tasks to o1 for optimal cost-performance balance.

How We Use AI

Data Sources: Papers With Code, OpenAI, Anthropic, Google, Meta official documentation. Benchmark aggregation and Value Score calculation is automated. Raw benchmark scores are never modified — we only add context and recommendations.

Data sources: Papers With Code, OpenAI, Anthropic, Google, Meta, Mistral official documentation. Updated December 2024. Pricing in USD per 1M tokens.