• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, January 22, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

A smarter way for large language models to think about hard problems | MIT News

Josh by Josh
December 4, 2025
in Al, Analytics and Automation
0
A smarter way for large language models to think about hard problems | MIT News
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter



To make large language models (LLMs) more accurate when answering harder questions, researchers can let the model spend more time thinking about potential solutions.

But common approaches that give LLMs this capability set a fixed computational budget for every problem, regardless of how complex it is. This means the LLM might waste computational resources on simpler questions or be unable to tackle intricate problems that require more reasoning.

To address this, MIT researchers developed a smarter way to allocate computational effort as the LLM solves a problem. Their method enables the model to dynamically adjust its computational budget based on the difficulty of the question and the likelihood that each partial solution will lead to the correct answer.

The researchers found that their new approach enabled LLMs to use as little as one-half the computation as existing methods, while achieving comparable accuracy on a range of questions with varying difficulties. In addition, their method allows smaller, less resource-intensive LLMs to perform as well as or even better than larger models on complex problems.

By improving the reliability and efficiency of LLMs, especially when they tackle complex reasoning tasks, this technique could reduce the energy consumption of generative AI systems and enable the use of LLMs in more high-stakes and time-sensitive applications.

“The computational cost of inference has quickly become a major bottleneck for frontier model providers, and they are actively trying to find ways to improve computational efficiency per user queries. For instance, the recent GPT-5.1 release highlights the efficacy of the ‘adaptive reasoning’ approach our paper proposes. By endowing the models with the ability to know what they don’t know, we can enable them to spend more compute on the hardest problems and most promising solution paths, and use far fewer tokens on easy ones. That makes reasoning both more reliable and far more efficient,” says Navid Azizan, the Alfred H. and Jean M. Hayes Career Development Assistant Professor in the Department of Mechanical Engineering and the Institute for Data, Systems, and Society (IDSS), a principal investigator of the Laboratory for Information and Decision Systems (LIDS), and the senior author of a paper on this technique.

Azizan is joined on the paper by lead author Young-Jin Park, a LIDS/MechE graduate student; Kristjan Greenewald, a research scientist in the MIT-IBM Watson AI Lab; Kaveh Alim, an IDSS graduate student; and Hao Wang, a research scientist at the MIT-IBM Watson AI Lab and the Red Hat AI Innovation Team. The research is being presented this week at the Conference on Neural Information Processing Systems.

Computation for contemplation

A recent approach called inference-time scaling lets a large language model take more time to reason about difficult problems.

Using inference-time scaling, the LLM might generate multiple solution attempts at once or explore different reasoning paths, then choose the best ones to pursue from those candidates.

A separate model, known as a process reward model (PRM), scores each potential solution or reasoning path. The LLM uses these scores to identify the most promising ones.     

Typical inference-time scaling approaches assign a fixed amount of computation for the LLM to break the problem down and reason about the steps.

Instead, the researchers’ method, known as instance-adaptive scaling, dynamically adjusts the number of potential solutions or reasoning steps based on how likely they are to succeed, as the model wrestles with the problem.

“This is how humans solve problems. We come up with some partial solutions and then decide, should I go further with any of these, or stop and revise, or even go back to my previous step and continue solving the problem from there?” Wang explains.

To do this, the framework uses the PRM to estimate the difficulty of the question, helping the LLM assess how much computational budget to utilize for generating and reasoning about potential solutions.

At every step in the model’s reasoning process, the PRM looks at the question and partial answers and evaluates how promising each one is for getting to the right solution. If the LLM is more confident, it can reduce the number of potential solutions or reasoning trajectories to pursue, saving computational resources.

But the researchers found that existing PRMs often overestimate the model’s probability of success.

Overcoming overconfidence

“If we were to just trust current PRMs, which often overestimate the chance of success, our system would reduce the computational budget too aggressively. So we first had to find a way to better calibrate PRMs to make inference-time scaling more efficient and reliable,” Park says.

The researchers introduced a calibration method that enables PRMs to generate a range of probability scores rather than a single value. In this way, the PRM creates more reliable uncertainty estimates that better reflect the true probability of success.

With a well-calibrated PRM, their instance-adaptive scaling framework can use the probability scores to effectively reduce computation while maintaining the accuracy of the model’s outputs.

When they compared their method to standard inference-time scaling approaches on a series of mathematical reasoning tasks, it utilized less computation to solve each problem while achieving similar accuracy.

“The beauty of our approach is that this adaptation happens on the fly, as the problem is being solved, rather than happening all at once at the beginning of the process,” says Greenewald.

In the future, the researchers are interested in applying this technique to other applications, such as code generation and AI agents. They are also planning to explore additional uses for their PRM calibration method, like for reinforcement learning and fine-tuning.

“Human employees learn on the job — some CEOs even started as interns — but today’s agents remain largely static pieces of probabilistic software. Work like this paper is an important step toward changing that: helping agents understand what they don’t know and building mechanisms for continual self-improvement. These capabilities are essential if we want agents that can operate safely, adapt to new situations, and deliver consistent results at scale,” says Akash Srivastava, director and chief architect of Core AI at IBM Software, who was not involved with this work.

This work was funded, in part, by the MIT-IBM Watson AI Lab, the MIT-Amazon Science Hub, the MIT-Google Program for Computing Innovation, and MathWorks. 



Source_link

READ ALSO

FlashLabs Researchers Release Chroma 1.0: A 4B Real Time Speech Dialogue Model With Personalized Voice Cloning

Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework that Enables Improved Robot Control and Video Generation

Related Posts

FlashLabs Researchers Release Chroma 1.0: A 4B Real Time Speech Dialogue Model With Personalized Voice Cloning
Al, Analytics and Automation

FlashLabs Researchers Release Chroma 1.0: A 4B Real Time Speech Dialogue Model With Personalized Voice Cloning

January 22, 2026
Al, Analytics and Automation

Salesforce AI Introduces FOFPred: A Language-Driven Future Optical Flow Prediction Framework that Enables Improved Robot Control and Video Generation

January 21, 2026
Why it’s critical to move beyond overly aggregated machine-learning metrics | MIT News
Al, Analytics and Automation

Why it’s critical to move beyond overly aggregated machine-learning metrics | MIT News

January 21, 2026
What are Context Graphs? – MarkTechPost
Al, Analytics and Automation

What are Context Graphs? – MarkTechPost

January 21, 2026
IVO’s $55M Boost Signals AI-Driven Law Future (and It’s Just Getting Started)
Al, Analytics and Automation

IVO’s $55M Boost Signals AI-Driven Law Future (and It’s Just Getting Started)

January 20, 2026
How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS
Al, Analytics and Automation

How to Design a Fully Streaming Voice Agent with End-to-End Latency Budgets, Incremental ASR, LLM Streaming, and Real-Time TTS

January 20, 2026
Next Post
Content & Context Planning Calendar 2026 for Publishers – VDO.AI Blogs

Content & Context Planning Calendar 2026 for Publishers – VDO.AI Blogs

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Rowboat Rides and ‘Golden Bachelor’ Strolls

Rowboat Rides and ‘Golden Bachelor’ Strolls

September 23, 2025
How to create event floorplans for business success

How to create event floorplans for business success

June 10, 2025
Implementing OAuth 2.1 for MCP Servers with Scalekit: A Step-by-Step Coding Tutorial

Implementing OAuth 2.1 for MCP Servers with Scalekit: A Step-by-Step Coding Tutorial

September 2, 2025

This Travel Essential Will Banish Holiday Headaches

March 25, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How Corporate Storytelling Strengthens Brand Identity in 2026
  • Spin a Baddie Script (No Key, Auto Roll, Auto Equip)
  • Why LinkedIn says prompting was a non-starter — and small models was the breakthrough
  • 5 B2B Marketing Trends for 2026
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?