ModelBench

Compare LLM performance and value across providers

Best Value

Gemini 1.5 Flash

Value Score: 3367 — Best performance per dollar spent. At $0.075/1M input tokens, you get 78.9% HumanEval.

Best Performance

92.4% HumanEval, 91.8% MMLU — Top scores across benchmarks, ideal for complex reasoning tasks.

PRAQTOR Edge

Real-World Weighting

We factor in latency, context limits, and reliability — not just benchmark scores.

Cross-Provider Normalization

Compare apples to apples across different pricing structures.

Task-Specific Rankings

Coding vs summarization vs chat — pick the right model for your task.

Vision support only

	Provider
Gemini 1.5 FlashBest Value 👁️ Vision⚡ Functions	Google	78.9%	78.9%	86.2%	$0.075 / $0.3	3367
Llama 3.1 8B ⚡ Functions	Meta (Together)	72.6%	73%	84.5%	$0.18 / $0.18	3307
Gemini 2.0 Flash 👁️ Vision⚡ Functions	Google	89.7%	83.2%	93.1%	$0.1 / $0.4	2749
GPT-4o Mini 👁️ Vision⚡ Functions	OpenAI	87%	82%	93.2%	$0.15 / $0.6	1806
Mixtral 8x7B ⚡ Functions	Mistral (Together)	74%	70.6%	74.4%	$0.6 / $0.6	947
Llama 3.1 70B ⚡ Functions	Meta (Together)	80.5%	86%	94.1%	$0.88 / $0.88	765
Claude 3.5 Haiku 👁️ Vision⚡ Functions	Anthropic	88.1%	75.2%	91.6%	$0.8 / $4	274
Gemini 1.5 Pro 👁️ Vision⚡ Functions	Google	84.1%	85.9%	91.7%	$1.25 / $5	217
Llama 3.1 405B ⚡ Functions	Meta (Together)	89%	88.6%	96.8%	$3.5 / $3.5	203
Mistral Large ⚡ Functions	Mistral	84%	84%	91.2%	$2 / $6	167
o1 Mini	OpenAI	92.4%	85.2%	94.8%	$3 / $12	121
Claude 3.5 Sonnet 👁️ Vision⚡ Functions	Anthropic	92%	88.3%	96.4%	$3 / $15	79
GPT-4o 👁️ Vision⚡ Functions	OpenAI	90.2%	88.7%	95.3%	$5 / $15	71
o1Top Perf 👁️ Vision	OpenAI	92.4%	91.8%	96.4%	$15 / $60	25
Claude 3 Opus 👁️ Vision⚡ Functions	Anthropic	84.9%	86.8%	95%	$15 / $75	15

Why Use This Tool

Stop guessing which LLM to use. Compare benchmarks, pricing, and value scores in one place — make data-driven model decisions in minutes, not hours. Find the best balance of performance and cost for your specific use case.

PRAQTOR Insight

Gemini 1.5 Flash offers the best value — 78.9% HumanEval at just $0.075/1M input tokens. For maximum performance, o1 leads with 92.4% HumanEval.💡 Tip: Use tiered routing — route 80% of simple queries to Gemini 1.5 Flash and complex tasks to o1 for optimal cost-performance balance.

How We Use AI

Data Sources: Papers With Code, OpenAI, Anthropic, Google, Meta official documentation. Benchmark aggregation and Value Score calculation is automated. Raw benchmark scores are never modified — we only add context and recommendations.

Data sources: Papers With Code, OpenAI, Anthropic, Google, Meta, Mistral official documentation. Updated December 2024. Pricing in USD per 1M tokens.