Question 1

How is LLM API pricing calculated?

Accepted Answer

Models are priced per million tokens, split into input (your prompt) and output (the response). Output is usually 3–5× the input price, so the blended cost depends on your prompt/response ratio.

Question 2

What is a context window?

Accepted Answer

The context window is the maximum number of tokens (input + output) a model can consider at once. Larger windows (200K–2M) let you pass more documents or history, but can cost more per call.

Question 3

Which LLM API is cheapest?

Accepted Answer

Sort the Input or Output column above. Budget models like GPT-4o-mini, Gemini Flash and DeepSeek are the cheapest per token; frontier models (GPT-4o, Claude Sonnet) cost more but are more capable.

Model ↕	Input ↑	Output ↕	Context ↕	Modalities	Capabilities
gemini-flash· Google Cloud	$0.075	$0.3	1000K	text, image, video, audio	FunctionsStructuredVision
gpt-4o-mini· OpenAI	$0.15	$0.6	128K	text, image	FunctionsStructuredVision
mistral-small· Mistral AI	$0.2	$0.6	128K	text	Functions
deepseek-chat· DeepSeek	$0.27	$1.1	64K	text	Functions
claude-haiku· Anthropic	$0.8	$4	200K	text, image	FunctionsStructuredVision
gemini-pro· Google Cloud	$1.25	$5	2000K	text, image, video, audio	FunctionsStructuredVision
mistral-large· Mistral AI	$2	$6	128K	text	FunctionsStructured
gpt-4o· OpenAI	$2.5	$10	128K	text, image, audio	FunctionsStructuredVisionWeb search
claude-sonnet· Anthropic	$3	$15	200K	text, image	FunctionsStructuredVision

LLM API pricing

Frequently asked questions