Ten million tokens of Claude API usage costs anywhere from $10 to $50 at standard rates, depending entirely on which model you route the traffic to and what your input-to-output ratio looks like. That spread is wide enough to meaningfully change the economics of a product, so the model-selection decision is effectively a pricing decision.
Here are the current per-million-token rates as of June 2026, verified against Anthropic’s pricing page:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context window |
|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M |
| Claude Opus 4.8 | $5.00 | $25.00 | 1M |
Output tokens cost exactly 5x input across all three tiers. That ratio is consistent and predictable, but it means output-heavy workloads (long-form generation, code writing, detailed analysis) hit the bill much harder than input-heavy ones (classification, extraction, summarization with short responses).
What does 10M tokens of Claude cost at each tier?
The math depends on how you split input and output. A typical API workload runs heavier on input than output, because prompts, system instructions, and context documents all consume input tokens before the model generates anything. Lets take two scenarios:
Scenario A: 8M input / 2M output (classification, extraction, summarization)
| Model | Input cost | Output cost | Total |
|---|---|---|---|
| Haiku 4.5 | $8.00 | $10.00 | $18.00 |
| Sonnet 4.6 | $24.00 | $30.00 | $54.00 |
| Opus 4.8 | $40.00 | $50.00 | $90.00 |
Scenario B: 5M input / 5M output (conversation, code generation, long-form writing)
| Model | Input cost | Output cost | Total |
|---|---|---|---|
| Haiku 4.5 | $5.00 | $25.00 | $30.00 |
| Sonnet 4.6 | $15.00 | $75.00 | $90.00 |
| Opus 4.8 | $25.00 | $125.00 | $150.00 |
The gap between the scenarios are dramatic. When output tokens dominate the workload, the total cost more than doubles on every tier. If you are running a chatbot or code-generation pipeline that produces long outputs, the output rate is the number that controls your bill, not the input rate.
How does prompt caching reduce the effective cost of 10M tokens on Claude?
Prompt caching is the single largest cost lever available through the API. When you send the same system prompt, tool schema, or reference document across multiple requests, cached input tokens are billed at 10% of the standard input price. The first request writes the content to cache (at a 1.25x premium for 5-minute TTL, or 2x for 1-hour TTL), and every subsequent cache hit within the window drops to the discounted read rate.
if 70% of your input tokens are cache hits (a reasonable number for applications with a stable system prompt and few-shot examples), the effective input rate drops significantly.
On Claude Sonnet 4.6, for example, 7M cached input tokens at $0.30/M plus 1M fresh input at $3.00/M plus 2M output at $15.00/M comes to $2.10 + $3.00 + $30.00 = $35.10, compared to $54.00 without caching. That is a 35% reduction from caching alone.
On Claude Haiku 4.5, the same cache-hit pattern drops the bill from $18.00 to roughly $12.80.
The cache write premium means caching only pays off when content is reused. For one-shot requests with unique prompts each time, caching adds cost rather than saving it. The breakeven is fast, though: a single cache read covers the write premium on the 5-minute TTL tier.
How does the Batch API cut Claude costs by 50%?
The Message Batches API processes requests asynchronously within a 24-hour window and charges exactly 50% of standard token prices on both input and output. There is no quality difference between batch and synchronous responses.
For a 10M-token workload that can tolerate the latency, the batch discount changes the numbers substantially:
| Model | Standard (8M in / 2M out) | Batch (8M in / 2M out) |
|---|---|---|
| Haiku 4.5 | $18.00 | $9.00 |
| Sonnet 4.6 | $54.00 | $27.00 |
| Opus 4.8 | $90.00 | $45.00 |
Batch processing and prompt caching can be combined. A high-volume pipeline with repeated context running through the batch endpoint can stack both discounts, which is where the frequently cited “up to 95% savings” figure comes from. In practice, you will not hit 95% unless both your cache-hit ratio and your batch-eligible share are very high, but 50-70% reductions against standard pricing are achievable for well-structured workloads.
Which Claude model should you default to for production workloads?
Claude Opus 4.8 tops LMArena at approximately 1510 Elo and leads SWE-bench Verified at 88.6%, so on raw capability the flagship tier wins. The question is which model clears the quality bar for a given task at the lowest cost.
- Haiku 4.5 at $1/$5 handles classification, routing, extraction, structured data parsing, and straightforward summarization. If you are building a pipeline that needs to process thousands of requests per hour on simple tasks, the budget tier does the work at one-fifth the input cost and one-third the output cost of Sonnet. For high-volume AI text generation workloads, Haiku is where margins survive.
- Sonnet 4.6 at $3/$15 is the production default for most applications. The mid-tier model supports the full 1M context window at standard pricing with no surcharge, handles multi-step reasoning reliably, and scores higher than Opus 4.6 on the GDPval-AA knowledge-work benchmark. For everyday API use cases spanning chatbots, content generation, code assistance, and document analysis, Sonnet 4.6 hits the price-to-quality balance point.
- Opus 4.8 at $5/$25 is the tier you route to when quality on a specific request justifies the premium. Agentic coding tasks, complex multi-file code edits, hard reasoning problems, and long-horizon workflows where retry costs compound are the workloads that benefit from the flagship. Running everything on Opus by default is the most common source of unnecessary API spend.
How does the cascade pattern reduce Claude API costs by 5-15x?
The cascade pattern is the architectural decision that determines the real cost of a Claude-powered product. Instead of routing every request to one model, you default all incoming traffic to Haiku 4.5 and escalate only the requests that need higher capability to Sonnet or Opus.
Here is the math on 10M tokens across 10,000 requests. If 80% of requests are simple enough for Haiku and 20% require Sonnet:
- 8M tokens on Haiku (8M × $1 input / 1M + proportional output): roughly $14.40
- 2M tokens on Sonnet (2M × $3 input / 1M + proportional output): roughly $10.80
- Blended total: approximately $25.20
Compare that to running the entire 10M-token load on Sonnet alone at $54.00, or on Opus at $90.00. The cascade cut the bill by more than half against Sonnet, and by more than 70% against Opus, while keeping the harder requests on a more capable model.
The implementation is straightforward. You build a lightweight classifier (which can itself be a Haiku call) that evaluates each incoming request and routes based on complexity. Simple factual queries, templated tasks, and extraction jobs go to Haiku. Multi-step reasoning, code generation, and ambiguous instructions escalate to Sonnet or Opus.
For a startup running 1,000 free-trial users at 10 API calls each on Haiku, the total cost is in the low single digits of dollars. That is the lever that makes a free product tier economically viable.
How does Anthropic’s pricing compare to OpenAI and Google for 10M tokens?
The competitive pricing landscape puts Claude’s rates in context. Here are the current flagship and mid-tier models from each major provider:
| Model | Input (per 1M) | Output (per 1M) | 10M tokens (8M in / 2M out) |
|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | $18.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $54.00 |
| Claude Opus 4.8 | $5.00 | $25.00 | $90.00 |
| GPT-5.4 | $2.50 | $15.00 | $50.00 |
| GPT-5.5 | $5.00 | $30.00 | $100.00 |
| Gemini 3.1 Pro | $2.00 | $12.00 | $40.00 |
| Gemini 3.5 Flash | $1.50 | $9.00 | $30.00 |
Sonnet 4.6 at $3/$15 sits between GPT-5.4 at $2.50/$15 and Gemini 3.1 Pro at $2/$12 on input cost, though Sonnet and GPT-5.4 share the same $15 output rate. Opus 4.8 matches GPT-5.5 on input at $5, but GPT-5.5 charges $30 on output compared to Opus’s $25, making the Anthropic flagship cheaper on output-heavy workloads.
Google’s Gemini 3.5 Flash at $1.50/$9.00 undercuts every Anthropic tier on raw price and includes a free tier through Google AI Studio for prototyping. If cost is the dominant constraint and you are not locked into Claude-specific features, the Gemini Flash family is the cheapest production-grade option from a major provider.
DeepSeek V4-Pro at $0.45/M input is the budget outlier, offering frontier-tier performance at a fraction of any Western provider’s rates. Whether that tradeoff works depends on your data-residency and compliance requirements.
What does Opus 4.8 Fast Mode cost, and when should you use it?
Opus 4.8 introduced a cheaper Fast Mode at $10 input / $50 output per million tokens, running at approximately 2.5x the speed of standard inference. That rate is 2x the standard Opus price, but it is 3x cheaper than the Fast Mode pricing on previous Opus models ($30/$150 on Opus 4.7).
For 10M tokens at 8M input / 2M output, Fast Mode costs $180 compared to $90 at standard rates. The tradeoff is strictly latency versus cost: Fast Mode returns responses faster, which matters for real-time applications, but doubles the bill. If your workload can tolerate standard latency, standard pricing is always the better choice.
What hidden costs affect the real Claude API bill beyond per-token rates?
Several billing mechanics beyond the headline rates can inflate the actual spend:
- Extended thinking tokens: Opus and Sonnet support extended thinking, where the model generates internal reasoning tokens before producing the visible response. These thinking tokens count as output tokens and are billed at the output rate. A complex reasoning request on Opus can generate 10,000-20,000 thinking tokens at $25/M, adding $0.25-$0.50 per request in reasoning costs alone. You can control this with effort settings (low, high, xhigh, max on Opus 4.8).
- Tool-use system prompt overhead: When you define tools in the API, Claude injects additional system tokens. On Sonnet 4.6, this adds roughly 497 tokens per request compared to 313 on older models. For high-volume applications making thousands of tool-use calls per day, those tokens add up.
- Data residency multiplier: Specifying US-only inference through the
inference_geoparameter incurs a 1.1x multiplier on all token categories including cache reads and writes. If you are running from outside the US and do not require US processing, skip this parameter. - Opus 4.7 tokenizer difference: Opus 4.7 introduced a new tokenizer that can consume up to 35% more tokens for the same text compared to Opus 4.6, particularly on code, structured data, and non-English text. The per-token rate is identical, but your effective cost per task can increase. If you are migrating from Opus 4.6 to 4.7 or 4.8, benchmark your actual token consumption before projecting costs.
How much does a production application realistically spend on Claude per month?
Working backward from 10M tokens: a free ChatGPT chat alternative or an internal support chatbot handling 500 conversations per day, averaging 2,000 tokens per conversation (prompt + response), consumes about 1M tokens per day, or 30M tokens per month.
On Sonnet 4.6 with prompt caching and a 70% cache-hit ratio, that 30M-token monthly workload costs roughly $105-120 per month. Route the simple queries to Haiku in a cascade, and the blended cost drops to $60-80. Add batch processing for any non-real-time components, and you are looking at $40-60 per month for moderate production traffic.
For heavier workloads, a coding-agent pipeline processing 2M input tokens and 500K output tokens per month (a lighter use case) runs approximately $22.50 on Opus 4.8, $13.50 on Sonnet 4.6, or $4.50 on Haiku 4.5 at standard rates. Scale that by 10x to 20M input / 5M output for an active development team, and you are in the $45-$225/month range depending on model choice.
The largest bills come from agentic workflows where Claude calls tools, reasons through multi-step plans, and generates substantial output on every turn. A coding agent that averages 50,000 output tokens per task at the Opus rate burns $1.25 per task in output tokens alone. Run 100 tasks per day and the monthly output cost is $3,750, before accounting for input. This is where cascade routing and effort-level controls become mandatory cost governance.
What alternatives exist for teams that need zero marginal token cost?
Self-hosted open-source models eliminate per-token billing entirely and replace it with fixed infrastructure cost. Running an open-weight vision or language model on GPU infrastructure means you pay for compute time regardless of how many tokens you process, which inverts the cost curve at high volumes.
The tradeoff is capability. No open-source model matches Opus 4.8’s 88.6% SWE-bench Verified score or its 1510 Elo on LMArena. For workloads where the quality gap does not matter, such as bulk extraction, simple classification, or template-based generation, self-hosting can be dramatically cheaper at scale. For workloads that require frontier reasoning, the API cost is the price of access to models that open-source alternatives cannot yet replicate.
The practical middle ground for most teams: use the Claude API cascade (Haiku by default, Sonnet/Opus on escalation) for production quality, apply prompt caching and batch processing wherever the latency budget allows, and evaluate self-hosted alternatives for the highest-volume commodity tasks. The 10M-token bill is not fixed. With the right routing, it is a variable you control.