• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, January 29, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Alibaba Introduces Qwen3-Max-Thinking, a Test Time Scaled Reasoning Model with Native Tool Use Powering Agentic Workloads

Josh by Josh
January 29, 2026
in Al, Analytics and Automation
0
Alibaba Introduces Qwen3-Max-Thinking, a Test Time Scaled Reasoning Model with Native Tool Use Powering Agentic Workloads
0
SHARES
1
VIEWS
Share on FacebookShare on Twitter


Qwen3-Max-Thinking is Alibaba’s new flagship reasoning model. It does not only scale parameters, it also changes how inference is done, with explicit control over thinking depth and built in tools for search, memory, and code execution.

https://qwen.ai/blog?id=qwen3-max-thinking

Model scale, data, and deployment

Qwen3-Max-Thinking is a trillion-parameter MoE flagship LLM pretrained on 36T tokens and built on the Qwen3 family as the top tier reasoning model. The model targets long horizon reasoning and code, not only casual chat. It runs with a context window of 260k tokens, which supports repository scale code, long technical reports, and multi document analysis within a single prompt.

READ ALSO

The Machine Learning Practitioner’s Guide to Model Deployment with FastAPI

SoftBank Plans Another Giant Bet on OpenAI

Qwen3-Max-Thinking is a closed model served through Qwen-Chat and Alibaba Cloud Model Studio with an OpenAI compatible HTTP API. The same endpoint can be called in a Claude style tool schema, so existing Anthropic or Claude Code flows can swap in Qwen3-Max-Thinking with minimal changes. There are no public weights, so usage is API based, which matches its positionin

Smart Test Time Scaling and experience cumulative reasoning

Most large language models improve reasoning by simple test time scaling, for example best of N sampling with several parallel chains of thought. That approach increases quality but cost grows almost linearly with the number of samples. Qwen3-Max-Thinking introduces an experience cumulative, multi round test time scaling strategy.

Instead of only sampling more in parallel, the model iterates within a single conversation, reusing intermediate reasoning traces as structured experience. After each round, it extracts useful partial conclusions, then focuses subsequent computation on unresolved parts of the question. This process is controlled by an explicit thinking budget that developers can adjust via API parameters such as enable_thinking and additional configuration fields.

The reported effect is that accuracy rises without a proportional increase in token count. For example, Qwen’s own ablations show GPQA Diamond increasing from around 90 level accuracy to about 92.8, and LiveCodeBench v6 rising from about 88.0 to 91.4 under the experience cumulative strategy at similar token budgets. This is important because it means higher reasoning quality can be driven by more efficient scheduling of compute, not only by more samples.

Native agent stack with Adaptive Tool Use

Qwen3-Max-Thinking integrates three tools as first class capabilities: Search, Memory, and a Code Interpreter. Search connects to web retrieval so the model can fetch fresh pages, extract content, and ground its answers. Memory stores user or session specific state, which supports personalized reasoning over longer workflows. The Code Interpreter executes Python, which allows numeric verification, data transforms, and program synthesis with runtime checks.

The model uses Adaptive Tool Use to decide when to invoke these tools during a conversation. Tool calls are interleaved with internal thinking segments, rather than being orchestrated by an external agent. This design reduces the need for separate routers or planners and tends to reduce hallucinations, because the model can explicitly fetch missing information or verify calculations instead of guessing.

Tool ability is also benchmarked. On Tau² Bench, which measures function calling and tool orchestration, Qwen3-Max-Thinking reports a score of 82.1, comparable with other frontier models in this category.

Benchmark profile across knowledge, reasoning, and search

On 19 public benchmarks, Qwen3-Max-Thinking is positioned at or near the same level as GPT 5.2 Thinking, Claude Opus 4.5, and Gemini 3 Pro. For knowledge tasks, reported scores include 85.7 on MMLU-Pro, 92.8 on MMLU-Redux, and 93.7 on C-Eval, where Qwen leads the group on Chinese language evaluation.

For hard reasoning, it records 87.4 on GPQA, 98.0 on HMMT Feb 25, 94.7 on HMMT Nov 25, and 83.9 on IMOAnswerBench, which puts it in the top tier of current math and science models. On coding and software engineering it reaches 85.9 on LiveCodeBench v6 and 75.3 on SWE Verified.

In the base HLE configuration Qwen3-Max-Thinking scores 30.2, below Gemini 3 Pro at 37.5 and GPT 5.2 Thinking at 35.5. In a tool enabled HLE setup, the official comparison table that includes web search integration shows Qwen3-Max-Thinking at 49.8, ahead of GPT 5.2 Thinking at 45.5 and Gemini 3 Pro at 45.8. With its most aggressive experience cumulative test time scaling configuration on HLE with tools, Qwen3-Max-Thinking reaches 58.3 while GPT 5.2 Thinking remains at 45.5, although that higher number is for a heavier inference mode than the standard comparison table.

Key Takeaways

  • Qwen3-Max-Thinking is a closed, API only flagship reasoning model from Alibaba, built on a more than 1 trillion parameter backbone trained on about 36 trillion tokens with a 262144 token context window.
  • The model introduces experience cumulative test time scaling, where it reuses intermediate reasoning across multiple rounds, improving benchmarks such as GPQA Diamond and LiveCodeBench v6 at similar token budgets.
  • Qwen3-Max-Thinking integrates Search, Memory, and a Code Interpreter as native tools and uses Adaptive Tool Use so the model itself decides when to browse, recall state, or execute Python during a conversation.
  • On public benchmarks it reports competitive scores with GPT 5.2 Thinking, Claude Opus 4.5, and Gemini 3 Pro, including strong results on MMLU Pro, GPQA, HMMT, IMOAnswerBench, LiveCodeBench v6, SWE Bench Verified, and Tau² Bench..

Check out the API and Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.



Source_link

Related Posts

The Machine Learning Practitioner’s Guide to Model Deployment with FastAPI
Al, Analytics and Automation

The Machine Learning Practitioner’s Guide to Model Deployment with FastAPI

January 29, 2026
SoftBank Plans Another Giant Bet on OpenAI
Al, Analytics and Automation

SoftBank Plans Another Giant Bet on OpenAI

January 28, 2026
Al, Analytics and Automation

Tencent Hunyuan Releases HPC-Ops: A High Performance LLM Inference Operator Library

January 28, 2026
Everything You Need to Know About How Python Manages Memory
Al, Analytics and Automation

Everything You Need to Know About How Python Manages Memory

January 28, 2026
EU vs X: Grok’s Explicit-Image Mess Has Officially Crossed the Line
Al, Analytics and Automation

EU vs X: Grok’s Explicit-Image Mess Has Officially Crossed the Line

January 27, 2026
DSGym Offers a Reusable Container Based Substrate for Building and Benchmarking Data Science Agents
Al, Analytics and Automation

DSGym Offers a Reusable Container Based Substrate for Building and Benchmarking Data Science Agents

January 27, 2026
Next Post
Tesla is killing off its Model S and X cars to make robots

Tesla is killing off its Model S and X cars to make robots

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

The Biggest Acquisitions In History Since 2015: Insights and Outcomes

The Biggest Acquisitions In History Since 2015: Insights and Outcomes

August 13, 2025
Bangkok Bank maintains Strong Growth with Baht 46,007 Million Profit in 2025

Bangkok Bank maintains Strong Growth with Baht 46,007 Million Profit in 2025

January 21, 2026
Tailgate Trolling and Beast City

Tailgate Trolling and Beast City

January 6, 2026
Google’s AI Mode can now help you visualize your travel plans

Google’s AI Mode can now help you visualize your travel plans

November 17, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • LiteRT: The Universal Framework for On-Device AI
  • Advantage+ Creative Video Generation Beta and What Else I’m Seeing
  • Tesla is killing off its Model S and X cars to make robots
  • Alibaba Introduces Qwen3-Max-Thinking, a Test Time Scaled Reasoning Model with Native Tool Use Powering Agentic Workloads
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?