Claude Sonnet 5 vs Opus 4.8: When to Pay More
Sonnet 5 hits ~91% of Opus 4.8's agentic-coding score at 40% of the price. A routing rule for agents: default to Sonnet 5, and when Opus still earns it.

Anthropic shipped Claude Sonnet 5 on June 30, the same day it launched Claude Science. On its own, a faster, cheaper Sonnet would be a routine bump. The interesting part is what it does to a decision you make every time you build an agent: which model do you actually run?
Because Sonnet 5 is now good enough at agentic work that the honest default for most jobs is no longer Opus. The question flips to: when is the bigger model still worth paying for?
The two numbers that decide it
Start with capability and price, side by side. On agentic coding, TechCrunch's launch coverage puts Sonnet 5 at 63.2%, against Opus 4.8 at 69.2% and the older Sonnet 4.6 at 58.1%. So Sonnet 5 lands at about 91% of Opus on that benchmark, and Anthropic says it slightly beats Opus on knowledge work.
Now price, straight from Anthropic's model page:
| Model | API id | Input / MTok | Output / MTok | Agentic coding |
|---|---|---|---|---|
| Sonnet 5 | claude-sonnet-5 | $3 ($2 intro) | $15 ($10 intro) | 63.2% |
| Opus 4.8 | claude-opus-4-8 | $5 | $25 | 69.2% |
The intro pricing ($2 in, $10 out) runs through August 31, 2026, then settles at $3 and $15. So today Opus costs 2.5 times as much as Sonnet 5 per token. After August it's about 1.67 times. One detail worth catching: Sonnet 5's standard price is the same $3/$15 as Sonnet 4.6. Existing Sonnet users get the capability jump for free.
Put those together and the framing TechCrunch reached for is right: agentic ability is now table stakes, so the real question is how cheaply you can run it.
What changed in Sonnet 5
The benchmark gap undersells the practical shift. Anthropic says Sonnet 5 finishes complex tasks "where previous Sonnet models would stop short," holds a plan across stages, and checks its own output without being asked. Daniel Shepard at Zapier put it plainly: a workflow that "used to stall halfway" now runs, and for day-to-day automation "it's a no-brainer."
That matters for routing because it moves the line. The old rule was "use Sonnet for cheap, simple stuff, reach for Opus the moment it gets agentic." Sonnet 5 clears the agentic bar for most real workloads, so the cheap option is no longer the weak option.
A routing rule you can use
Default to Sonnet 5. Escalate to Opus 4.8 only when the task hits one of three triggers.
The three cases where Opus earns its premium:
- Edge-of-capability work. When a task sits where that 6-point benchmark gap actually bites, novel problems, hairy refactors, architecture decisions, Opus 4.8's higher ceiling changes the outcome, not just the polish.
- Long-horizon autonomous runs. Anthropic positions Opus 4.8 as state-of-the-art at overnight, self-correcting agentic work. When a single early misstep compounds across hours, the more capable planner is cheaper than the cleanup.
- High-stakes one-shots. If redoing the output costs more than the token premium (a customer-facing artifact, a migration you run once), pay for the better first attempt.
Everything else, which is most agent traffic, runs on Sonnet 5.
Quick check
Under this rule, when should you reach for Opus 4.8 instead of Sonnet 5?
Run the cost math on your own load
The per-token gap sounds small until you multiply it by volume. Say an agent task averages 1M input tokens and 200K output tokens, which is realistic once you count tool results and a few turns. The cost per task works out to:
Opus 4.8 1M×$5 + 0.2M×$25 = $10.00
Sonnet 5 (now) 1M×$2 + 0.2M×$10 = $4.00 # intro pricing
Sonnet 5 (Sep) 1M×$3 + 0.2M×$15 = $6.00 # standard pricingRun that agent 10,000 times a month and the difference is $100k on Opus against $40k now, or $60k after August. For a workload that gets ~91% of the quality, that delta funds a lot of other things. This is the same pressure driving the broader model price war: once capability converges, cost per task is the battlefield.
The catch (don't over-route)
Cheaper per token is not the same as cheaper per task. A weaker model that needs two extra retries, or burns more thinking tokens to reach the same answer, can cost more than the "expensive" one. Measure cost per completed task on your real workload, not the sticker price. Both models default to high effort, so a careless effort setting moves the bill more than the model choice does.
Two more things to keep honest. The 63.2 vs 69.2 split is one vendor's benchmark, not your app, so test both on your actual tasks before you commit a routing rule. And the cheap intro pricing expires August 31, which quietly raises Sonnet 5's cost by 50% overnight. Put a reminder on the calendar to re-run the math then.
What I'd actually do
Make Sonnet 5 the default model in your agent loop today. Keep one clean escape hatch that swaps in Opus 4.8 for the hard 10%, a per-task flag or a tier check, so escalating is a one-line change rather than a rewrite. Then watch your cost-per-completed-task dashboard for a week and let the real numbers, not the benchmark, tune where the line sits.
The bigger pattern is the one to internalize. When every lab's mid-tier model can run agents, the model stops being the interesting decision. How cheaply and reliably you route work across them is.

Written by
Rhythm Bhiwani
Engineer and relentless builder, happiest reverse-engineering hard problems until they click.
Enjoyed this?
Tap the heart to leave some love.
Be the first to react
Comments
Join the conversation.
Loading comments…


