• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, March 9, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Baidu Releases ERNIE-4.5-21B-A3B-Thinking: A Compact MoE Model for Deep Reasoning

Josh by Josh
September 10, 2025
in Al, Analytics and Automation
0


Baidu AI Research team has just released ERNIE-4.5-21B-A3B-Thinking, a new reasoning-focused large language model designed around efficiency, long-context reasoning, and tool integration. Being part of the ERNIE-4.5 family, this model is a Mixture-of-Experts (MoE) architecture with 21B total parameters but only 3B active parameters per token, making it computationally efficient while maintaining competitive reasoning capability. Released under the Apache-2.0 license, it is accessible for both research and commercial deployment via Hugging Face.

What is the architectural design of ERNIE-4.5-21B-A3B-Thinking?

ERNIE-4.5-21B-A3B-Thinking is built on a Mixture-of-Experts backbone. Instead of activating all 21B parameters, the router selects a subset of experts, resulting in 3B active parameters per token. This structure reduces computation without compromising the specialization of different experts. The research team applies router orthogonalization loss and token-balanced loss to encourage diverse expert activation and stable training.

READ ALSO

Pricing Breakdown and Core Feature Overview

Improving AI models’ ability to explain their predictions | MIT News

This design provides a middle ground between small dense models and ultra-large systems. The research team’s assumptions include a theory that ~3B active parameters per token may represent a practical sweet spot for reasoning performance versus deployment efficiency.

How does the model handle long-context reasoning?

A defining capability of ERNIE-4.5-21B-A3B-Thinking is its 128K context length. This allows the model to process very long documents, perform extended multi-step reasoning, and integrate structured data sources such as academic papers or multi-file codebases.

The research team achieves this through progressive scaling of Rotary Position Embeddings (RoPE)—gradually increasing the frequency base from 10K up to 500K during training. Additional optimizations, including FlashMask attention and memory-efficient scheduling, make these long-context operations computationally feasible.

What training strategy supports its reasoning?

The model follows the multi-stage recipe defined across the ERNIE-4.5 family:

  1. Stage I – Text-only pretraining builds the core language backbone, starting with 8K context and expanding to 128K.
  2. Stage II – Vision training is skipped for this text-only variant.
  3. Stage III – Joint multimodal training is not used here, as A3B-Thinking is purely textual.

Post-training focuses on reasoning tasks. The research team employs Supervised Fine-Tuning (SFT) across mathematics, logic, coding, and science, followed by Progressive Reinforcement Learning (PRL). Reinforcement stages begin with logic, then extend to mathematics and programming, and finally to broader reasoning tasks. This is enhanced by Unified Preference Optimization (UPO), which integrates preference learning with PPO to stabilize alignment and reduce reward hacking.

What role does tool usage play in this model?

ERNIE-4.5-21B-A3B-Thinking supports structured tool and function calling, making it useful for scenarios where external computation or retrieval is required. Developers can integrate it with vLLM, Transformers 4.54+, and FastDeploy. This tool-use capability is particularly suited for program synthesis, symbolic reasoning, and multi-agent workflows.

Built-in function calling allows the model to reason over long contexts while dynamically invoking external APIs, a key requirement for applied reasoning in enterprise systems.

How does ERNIE-4.5-21B-A3B-Thinking perform on reasoning benchmarks?

It show strong performance improvements across logical reasoning, mathematics, scientific QA, and programming tasks. In evaluations, the model demonstrates:

  • Enhanced accuracy in multi-step reasoning datasets, where long chains of thought are required.
  • Competitiveness with larger dense models on STEM reasoning tasks.
  • Stable text generation and academic synthesis performance, benefiting from extended context training.

These results suggest that the MoE structure amplifies reasoning specialization, making it efficient without requiring trillion-scale dense parameters.

https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking

How does it compare to other reasoning-focused LLMs?

This release gets into the landscape that includes OpenAI’s o3, Anthropic’s Claude 4, DeepSeek-R1, and Qwen-3. Many of these competitors rely on dense architectures or larger active parameter counts. Baidu research team’s choice of a compact MoE with 3B active parameters offers a different balance:

  • Scalability: Sparse activation reduces compute overhead while scaling expert capacity.
  • Long-context readiness: 128K context is directly trained, not retrofitted.
  • Commercial openness: Apache-2.0 license lowers adoption friction for enterprises.

Summary

ERNIE-4.5-21B-A3B-Thinking explains how deep reasoning can be achieved without massive dense parameter counts. By combining efficient MoE routing, 128K context training, and tool integration, Baidu’s research team offers a model that balances research-grade reasoning with deployment feasibility.


Check out the Model on Hugging Face and PAPER. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

Pricing Breakdown and Core Feature Overview
Al, Analytics and Automation

Pricing Breakdown and Core Feature Overview

March 9, 2026
Improving AI models’ ability to explain their predictions | MIT News
Al, Analytics and Automation

Improving AI models’ ability to explain their predictions | MIT News

March 9, 2026
Beyond Accuracy: Quantifying the Production Fragility Caused by Excessive, Redundant, and Low-Signal Features in Regression
Al, Analytics and Automation

Beyond Accuracy: Quantifying the Production Fragility Caused by Excessive, Redundant, and Low-Signal Features in Regression

March 9, 2026
Build Semantic Search with LLM Embeddings
Al, Analytics and Automation

Build Semantic Search with LLM Embeddings

March 8, 2026
PovChat Chatbot App Access, Costs, and Feature Insights
Al, Analytics and Automation

PovChat Chatbot App Access, Costs, and Feature Insights

March 8, 2026
Building Next-Gen Agentic AI: A Complete Framework for Cognitive Blueprint Driven Runtime Agents with Memory Tools and Validation
Al, Analytics and Automation

Building Next-Gen Agentic AI: A Complete Framework for Cognitive Blueprint Driven Runtime Agents with Memory Tools and Validation

March 8, 2026
Next Post
Meta tests letting anyone rate Community Notes

Meta tests letting anyone rate Community Notes

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

How to improve trust in executive comms with the ‘rule of three’

September 11, 2025
Tools + Use Cases for Marketers

Tools + Use Cases for Marketers

October 17, 2025
What Are the Best Independent Magazines for Branding? A Conversation With Steven Watson

What Are the Best Independent Magazines for Branding? A Conversation With Steven Watson

August 4, 2025
Cloud quantum computing: A trillion-dollar opportunity with dangerous hidden risks

Cloud quantum computing: A trillion-dollar opportunity with dangerous hidden risks

June 21, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Drive with Star Trek on Waze
  • The Complete Guide for 2026
  • My 5 Favorite AI Optimization Strategies to Get AI Citations
  • How to Defeat the Noxian Invaders Attacking Terbisia in Demacia Rising in League of Legends
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions