DeployCost

LLM APIs

LLM API pricing

Hosted model inference priced per million tokens. Sort any column; filter by context window, input price or capability. Building an AI app? See the full stack →

Updated Jun 18, 2026
9 models
ModelInputOutput
Google Cloud logogemini-flash· Google Cloud$0.075$0.3
OpenAI logogpt-4o-mini· OpenAI$0.15$0.6
mistral-small· Mistral AI$0.2$0.6
deepseek-chat· DeepSeek$0.27$1.1
Anthropic logoclaude-haiku· Anthropic$0.8$4
Google Cloud logogemini-pro· Google Cloud$1.25$5
mistral-large· Mistral AI$2$6
OpenAI logogpt-4o· OpenAI$2.5$10
Anthropic logoclaude-sonnet· Anthropic$3$15

Frequently asked questions

How is LLM API pricing calculated?+

Models are priced per million tokens, split into input (your prompt) and output (the response). Output is usually 3–5× the input price, so the blended cost depends on your prompt/response ratio.

What is a context window?+

The context window is the maximum number of tokens (input + output) a model can consider at once. Larger windows (200K–2M) let you pass more documents or history, but can cost more per call.

Which LLM API is cheapest?+

Sort the Input or Output column above. Budget models like GPT-4o-mini, Gemini Flash and DeepSeek are the cheapest per token; frontier models (GPT-4o, Claude Sonnet) cost more but are more capable.