Claude Opus 4.7: Stronger Model, Token Trap to Watch
Opus 4.7 is stronger, and the new tokenizer consumes up to 35% more tokens at the same price. Where the trap sits and what it means for workflows.
On April 16, 2026, Anthropic released Claude Opus 4.7. The official communication reads as expected: better software engineering performance, improved instruction following, stronger vision. What sits in slightly smaller print but is the actually interesting story to me: a new tokenizer that maps the same content to 1.0 to 1.35× more tokens than before. Price unchanged, $5 per million input tokens, $25 per million output tokens. At first glance that sounds like a technical footnote. I think it's more than that.
What Opus 4.7 actually brings
Start with what improved, concretely and without the usual superlatives. According to Anthropic's official announcement, the gains across several areas are real and measurable.
Software engineering: Opus 4.7 does noticeably better than its predecessor on complex, long-running tasks, especially when multiple steps are needed and the model has to verify its own outputs along the way. That's not a benchmark trick; it's a strength you feel in agentic workflows where a model doesn't only generate but also verifies.
Instruction following: The model follows instructions more strictly than 4.6. For cleanly written prompts that's a real win. What it means for existing prompts tuned to the softer behavior of older models comes further down.
Vision: The model processes images up to 2,576 pixels on the long side, more than three times the predecessor. Relevant for screenshots with lots of text, technical diagrams, chemical structure formulas, or high-resolution documents where the old limit was a real constraint.
Long-context reasoning: On very long sessions, reasoning should stay more stable and memory use should improve. Whether that holds up in practice has to be tested, but it's an area where 4.6 had clear room.
New on top: the "xhigh" effort level for finer steering between reasoning depth and latency, Task Budgets as public beta for per-task token budgeting, and for Claude Code users, a new /ultrareview function. On the finance agent benchmark, Opus 4.7 hits state of the art according to Anthropic, relevant for anyone using the model in analytically heavy contexts.
What gets less attention in the communication: prompts calibrated for older models may need adjustment. Stricter instruction following is an improvement when prompts are clean. When they aren't, it can produce unexpected behavior.
The token problem: same price, more consumption
Anyone on the $20 plan knows it: Opus burns through quota faster than Sonnet and hits limits sooner. Opus 4.7 doesn't fix that.
The new tokenizer maps the same content to more tokens. That means: send the same text to Opus 4.7 and Opus 4.6, and Opus 4.7 uses up to 35% more input tokens in the worst case, at the same price per token. On top, the "xhigh" effort level pushes output tokens further up when you need heavier reasoning. The price per token stayed the same, the token use per task did not.
If you run this in automation with many passes or load large contexts regularly, you'll see it on your bill, even if Pro subscribers cushion it through their plan. On the API side, where you pay directly per token, it shows up immediately.
The token trap: a creeping dependency risk
What occupies me more than the concrete cost difference is what I call the token trap: a structural risk for anyone now building workflows on Opus 4.7.
The logic: a stronger model uses more tokens. People who use it get used to the better quality and build workflows around that quality. Over time, a dependency forms that's hard to back out of. If Anthropic later raises the price or replaces the model with an even stronger one that uses more again, you have no good exit option.
This isn't hypothetical. It's happening with previous-generation models: anyone who built workflows on GPT-4 knows how hard it is to backport them to cheaper models without quality loss. The difference with Opus 4.7 is that the tokenizer change strengthens the dependency without making it obvious.
The counter is deliberate context and token management from day one. Not every request needs Opus. Not every task needs the full context. Not every task needs "xhigh". In practice people load everything into context anyway because the model still answers most questions. To avoid that, concrete approaches sit in the post on context engineering.
For an entry on what tokens are and how they work, the prompting basics post is a useful starting point.
What Reddit users report, and why it still matters
Shortly after the release, a Reddit thread surfaced titled "claude opus 4.7 is a serious regression, not an upgrade". Users reported worse performance on creative writing and on coding tasks that rely on style and nuance. That's hard to evaluate from the outside, but the timing right after release follows a pattern.
It's a pattern that repeats with model updates. The official communication emphasizes what gets better. What shifts or degrades rarely gets named directly. If a model follows instructions more strictly, that can be an improvement in one context and lead to unexpected behavior in another, depending on how the prompts are built.
I recommend running your own tests before putting Opus 4.7 into production. Anyone with fine-grained prompts that carry many instructions should test concretely whether the stricter instruction following makes results better or just different. The same applies to anyone using Claude Cowork or Claude Code heavily with their own skills and prompts.
What this means in practice
From the official documentation and the experience with the predecessor, a few clear recommendations:
- The Task Budgets in beta are worth a close look. A tool that caps per-task token use is the right counter to the token trap, if it's well implemented.
- Don't carry existing prompts over unchanged. Test them against the changed instruction following, especially if you work with detailed system prompts.
- API users who pay directly per token should add the higher tokenizer use into their cost plan before moving Opus 4.7 into production.
- For vision-heavy workflows, the improved image understanding is a real upside that can justify the extra consumption.
For the difference between Sonnet and Opus, my comparison of the two models has the details. It also covers when Sonnet is enough. With the new tokenizer, that calculation will more often favor Sonnet.
Anthropic ships a stronger model in Opus 4.7. Anyone who builds in context and token management from the start can use the gains without later walking into a cost trap.
FAQ
- What's new in Claude Opus 4.7?
- Stronger software engineering on long multi-step tasks, stricter instruction following, vision up to 2,576 pixels on the long side (over three times the predecessor), a new 'xhigh' effort level, Task Budgets in public beta, and an /ultrareview function for Claude Code.
- Does Opus 4.7 cost more than 4.6?
- The price per token is unchanged ($5 per million input, $25 per million output), but a new tokenizer maps the same content to up to 35% more tokens. So the same task can cost more, which shows up immediately on the API where you pay per token.
- What is the token trap with Opus 4.7?
- A stronger model uses more tokens, you get used to the quality and build workflows around it, and a dependency forms that's hard to exit if prices rise or the model changes. The counter is deliberate context and token management from the start: not every task needs Opus, full context, or 'xhigh'.
