Anthropic's Sonnet 4.6 matches flagship AI performance at one-fifth the cost, accelerating enterprise adoption

Anthropic on Tuesday released Claude Sonnet 4.6, a model that amounts to a seismic repricing event for the AI industry. It delivers near-flagship intelligence at mid-tier cost, and it lands squarely in the middle of an unprecedented corporate rush to deploy AI agents and automated coding tools.

Texas AG sues TP-Link over purported connection to China

Inside the Homeland Security Forum Where ICE Agents Talk Shit About Other Agents

The model is a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It features a 1M token context window in beta. It is now the default model in claude.ai and Claude Cowork, and pricing holds steady at $3/$15 per million tokens — the same as its predecessor, Sonnet 4.5.

That pricing detail is the headline that matters most. Anthropic's flagship Opus models cost $15/$75 per million tokens — five times the Sonnet price. Yet performance that would have previously required reaching for an Opus-class model — including on real-world, economically valuable office tasks — is now available with Sonnet 4.6. For the thousands of enterprises now deploying AI agents that make millions of API calls per day, that math changes everything.

Why the cost of running AI agents at scale just dropped dramatically

To understand the significance of this release, you need to understand the moment it arrives in. The past year has been dominated by the twin phenomena of "vibe coding" and agentic AI. Claude Code — Anthropic's developer-facing terminal tool — has become a cultural force in Silicon Valley, with engineers building entire applications through natural-language conversation. The New York Times profiled its meteoric rise in January. The Verge recently declared that Claude Code is having a genuine "moment." OpenAI, meanwhile, has been waging its own offensive with Codex desktop applications and faster inference chips.

The result is an industry where AI models are no longer evaluated in isolation. They are evaluated as the engines inside autonomous agents — systems that run for hours, make thousands of tool calls, write and execute code, navigate browsers, and interact with enterprise software. Every dollar spent per million tokens gets multiplied across those thousands of calls. At scale, the difference between $15 and $3 per million input tokens is not incremental. It is transformational.

The benchmark table Anthropic released paints a striking picture. On SWE-bench Verified, the industry-standard test for real-world software coding, Sonnet 4.6 scored 79.6% — nearly matching Opus 4.6's 80.8%. On agentic computer use (OSWorld-Verified), Sonnet 4.6 scored 72.5%, essentially tied with Opus 4.6's 72.7%. On office tasks (GDPval-AA Elo), Sonnet 4.6 actually scored 1633, surpassing Opus 4.6's 1606. On agentic financial analysis, Sonnet 4.6 hit 63.3%, beating every model in the comparison, including Opus 4.6 at 60.1%.

These are not marginal differences. In many of the categories enterprises care about most, Sonnet 4.6 matches or beats models that cost five times as much to run. An enterprise running an AI agent that processes 10 million tokens per day was previously forced to choose between inferior results at lower cost or superior results at rapidly scaling expense. Sonnet 4.6 largely eliminates that trade-off.

In Claude Code, early testing found that users preferred Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time. Users even preferred Sonnet 4.6 to Opus 4.5, Anthropic's frontier model from November, 59% of the time. They rated Sonnet 4.6 as significantly less prone to over-engineering and "laziness," and meaningfully better at instruction following. They reported fewer false claims of success, fewer hallucinations, and more consistent follow-through on multi-step tasks.

How Claude's computer use abilities went from 'experimental' to near-human in 16 months

One of the most dramatic storylines in the release is Anthropic's progress on computer use — the ability of an AI to operate a computer the way a human does, clicking a mouse, typing on a keyboard, and navigating software that lacks modern APIs.

When Anthropic first introduced this capability in October 2024, the company acknowledged it was "still experimental — at times cumbersome and error-prone." The numbers since then tell a remarkable story: on OSWorld, Claude Sonnet 3.5 scored 14.9% in October 2024. Sonnet 3.7 reached 28.0% in February 2025. Sonnet 4 hit 42.2% by June. Sonnet 4.5 climbed to 61.4% in October. Now Sonnet 4.6 has reached 72.5% — nearly a fivefold improvement in 16 months.

This matters because computer use is the capability that unlocks the broadest set of enterprise applications for AI agents. Almost every organization has legacy software — insurance portals, government databases, ERP systems, hospital scheduling tools — that was built before APIs existed. A model that can simply look at a screen and interact with it opens all of these to automation without building bespoke connectors.

Jamie Cuffe, CEO of Pace, said Sonnet 4.6 hit 94% on their complex insurance computer use benchmark, the highest of any Claude model tested. "It reasons through failures and self-corrects in ways we haven't seen before," Cuffe said in a statement sent to VentureBeat. Will Harvey, co-founder of Convey, called it "a clear improvement over anything else we've tested in our evals."

The safety dimension of computer use also got attention. Anthropic noted that computer use poses prompt injection risks — malicious actors hiding instructions on websites to hijack the model — and said its evaluations show Sonnet 4.6 is a major improvement over Sonnet 4.5 in resisting such attacks. For enterprises deploying agents that browse the web and interact with external systems, that hardening is not optional.

Enterprise customers say the model closes the gap between Sonnet and Opus pricing tiers

The customer reaction has been unusually specific about cost-performance dynamics. Multiple early testers explicitly described Sonnet 4.6 as eliminating the need to reach for the more expensive Opus tier.

Caitlin Colgrove, CTO of Hex Technologies, said the company is moving the majority of its traffic to Sonnet 4.6, noting that with adaptive thinking and high effort, "we see Opus-level performance on all but our hardest analytical tasks with a more efficient and flexible profile. At Sonnet pricing, it's an easy call for our workloads."

Ben Kus, CTO of Box, said the model outperformed Sonnet 4.5 in heavy reasoning Q&A by 15 percentage points across real enterprise documents. Michele Catasta, President of Replit, called the performance-to-cost ratio "extraordinary." Ryan Wiggins of Mercury Banking put it more bluntly: "Claude Sonnet 4.6 is faster, cheaper, and more likely to nail things on the first try. That combination was a surprising combination of improvements, and we didn't expect to see it at this price point."

The coding improvements resonate particularly given Claude Code's dominance in the developer tools market. David Loker, VP of AI at CodeRabbit, said the model "punches way above its weight class for the vast majority of real-world PRs." Leo Tchourakov of Factory AI said the team is "transitioning our Sonnet traffic over to this model." GitHub's VP of Product, Joe Binder, confirmed the model is "already excelling at complex code fixes, especially when searching across large codebases is essential."

Brendan Falk, Founder and CEO of Hercules, went further: "Claude Sonnet 4.6 is the best model we have seen to date. It has Opus 4.6 level accuracy, instruction following, and UI, all for a meaningfully lower cost."

A simulated business competition reveals how AI agents plan over months, not minutes

Buried in the technical details is a capability that hints at where autonomous AI agents are heading. Sonnet 4.6's 1M token context window can hold entire codebases, lengthy contracts, or dozens of research papers in a single request. Anthropic says the model reasons effectively across all that context — a claim the company demonstrated through an unusual evaluation.

The Vending-Bench Arena tests how well a model can run a simulated business over time, with different AI models competing against each other for the biggest profits. Without human prompting, Sonnet 4.6 developed a novel strategy: it invested heavily in capacity for the first ten simulated months, spending significantly more than its competitors, and then pivoted sharply to focus on profitability in the final stretch. The model ended its 365-day simulation at approximately $5,700 in balance, compared to Sonnet 4.5's roughly $2,100.

This kind of multi-month strategic planning, executed autonomously, represents a qualitatively different capability than answering questions or generating code snippets. It is the type of long-horizon reasoning that makes AI agents viable for real business operations — and it helps explain why Anthropic is positioning Sonnet 4.6 not just as a chatbot upgrade, but as the engine for a new generation of autonomous systems.

Anthropic's Sonnet 4.6 arrives as the company expands into enterprise markets and defense

This release does not arrive in a vacuum. Anthropic is in the middle of the most consequential stretch in its history, and the competitive landscape is intensifying on every front.

On the same day as this launch, TechCrunch reported that Indian IT giant Infosys announced a partnership with Anthropic to build enterprise-grade AI agents, integrating Claude models into Infosys's Topaz AI platform for banking, telecoms, and manufacturing. Anthropic CEO Dario Amodei told TechCrunch there is "a big gap between an AI model that works in a demo and one that works in a regulated industry," and that Infosys helps bridge it. TechCrunch also reported that Anthropic opened its first India office in Bengaluru, and that India now accounts for about 6% of global Claude usage, second only to the U.S. The company, which CNBC reported is valued at $183 billion, has been expanding its enterprise footprint rapidly.

Meanwhile, Anthropic president Daniela Amodei told ABC News last week that AI would make humanities majors "more important than ever," arguing that critical thinking skills would become more valuable as large language models master technical work. It is the kind of statement a company makes when it believes its technology is about to reshape entire categories of white-collar employment.

The competitive picture for Sonnet 4.6 is also notable. The model outperforms Google's Gemini 3 Pro and OpenAI's GPT-5.2 on multiple benchmarks. GPT-5.2 trails on agentic computer use (38.2% vs. 72.5%), agentic search (77.9% vs. 74.7% for Sonnet 4.6's non-Pro score), and agentic financial analysis (59.0% vs. 63.3%). Gemini 3 Pro shows competitive performance on visual reasoning and multilingual benchmarks, but falls behind on the agentic categories where enterprise investment is surging.

The broader takeaway may not be about any single model. It is about what happens when Opus-class intelligence becomes available for a few dollars per million tokens rather than a few tens of dollars. Companies that were cautiously piloting AI agents with small deployments now face a fundamentally different cost calculus. The agents that were too expensive to run continuously in January are suddenly affordable in February.

Claude Sonnet 4.6 is available now on all Claude plans, Claude Cowork, Claude Code, the API, and all major cloud platforms. Anthropic has also upgraded its free tier to Sonnet 4.6 by default. Developers can access it immediately using claude-sonnet-4-6 via the Claude API.

Source_link