• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, April 13, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

New method could increase LLM training efficiency | MIT News

Josh by Josh
February 26, 2026
in Al, Analytics and Automation
0
New method could increase LLM training efficiency | MIT News



Reasoning large language models (LLMs) are designed to solve complex problems by breaking them down into a series of smaller steps. These powerful models are particularly good at challenging tasks like advanced programming and multistep planning.

But developing reasoning models demands an enormous amount of computation and energy due to inefficiencies in the training process. While a few of the high-power processors continuously work through complicated queries, others in the group sit idle.

Researchers from MIT and elsewhere found a way to use this computational downtime to efficiently accelerate reasoning-model training.

Their new method automatically trains a smaller, faster model to predict the outputs of the larger reasoning LLM, which the larger model verifies. This reduces the amount of work the reasoning model must do, accelerating the training process.

The key to this system is its ability to train and deploy the smaller model adaptively, so it kicks in only when some processors are idle. By leveraging computational resources that would otherwise have been wasted, it accelerates training without incurring additional overhead.

When tested on multiple reasoning LLMs, the method doubled the training speed while preserving accuracy. This could reduce the cost and increase the energy efficiency of developing advanced LLMs for applications such as forecasting financial trends or detecting risks in power grids.

“People want models that can handle more complex tasks. But if that is the goal of model development, then we need to prioritize efficiency. We found a lossless solution to this problem and then developed a full-stack system that can deliver quite dramatic speedups in practice,” says Qinghao Hu, an MIT postdoc and co-lead author of a paper on this technique.

He is joined on the paper by co-lead author Shang Yang, an electrical engineering and computer science (EECS) graduate student; Junxian Guo, an EECS graduate student; senior author Song Han, an associate professor in EECS, member of the Research Laboratory of Electronics and a distinguished scientist of NVIDIA; as well as others at NVIDIA, ETH Zurich, the MIT-IBM Watson AI Lab, and the University of Massachusetts at Amherst. The research will be presented at the ACM International Conference on Architectural Support for Programming Languages and Operating Systems.

Training bottleneck

Developers want reasoning LLMs to identify and correct mistakes in their critical thinking process. This capability allows them to ace complicated queries that would trip up a standard LLM.

To teach them this skill, developers train reasoning LLMs using a technique called reinforcement learning (RL). The model generates multiple potential answers to a query, receives a reward for the best candidate, and is updated based on the top answer. These steps repeat thousands of times as the model learns.

But the researchers found that the process of generating multiple answers, called rollout, can consume as much as 85 percent of the execution time needed for RL training.

“Updating the model — which is the actual ‘training’ part — consumes very little time by comparison,” Hu says.

This bottleneck occurs in standard RL algorithms because all processors in the training group must finish their responses before they can move on to the next step. Because some processors might be working on very long responses, others that generated shorter responses wait for them to finish.

“Our goal was to turn this idle time into speedup without any wasted costs,” Hu adds.

They sought to use an existing technique, called speculative decoding, to speed things up. Speculative decoding involves training a smaller model called a drafter to rapidly guess the future outputs of the larger model.

The larger model verifies the drafter’s guesses, and the responses it accepts are used for training.

Because the larger model can verify all the drafter’s guesses at once, rather than generating each output sequentially, it accelerates the process.

An adaptive solution

But in speculative decoding, the drafter model is typically trained only once and remains static. This makes the technique infeasible for reinforcement learning, since the reasoning model is updated thousands of times during training.

A static drafter would quickly become stale and useless after a few steps.

To overcome this problem, the researchers created a flexible system known as “Taming the Long Tail,” or TLT.

The first part of TLT is an adaptive drafter trainer, which uses free time on idle processors to train the drafter model on the fly, keeping it well-aligned with the target model without using extra computational resources.

The second component, an adaptive rollout engine, manages speculative decoding to automatically select the optimal strategy for each new batch of inputs. This mechanism changes the speculative decoding configuration based on the training workload features, such as the number of inputs processed by the draft model and the number of inputs accepted by the target model during verification.

In addition, the researchers designed the draft model to be lightweight so it can be trained quickly. TLT reuses some components of the reasoning model training process to train the drafter, leading to extra gains in acceleration.

“As soon as some processors finish their short queries and become idle, we immediately switch them to do draft model training using the same data they are using for the rollout process. The key mechanism is our adaptive speculative decoding — these gains wouldn’t be possible without it,” Hu says.

They tested TLT across multiple reasoning LLMs that were trained using real-world datasets. The system accelerated training between 70 and 210 percent while preserving the accuracy of each model.

As an added bonus, the small drafter model could readily be utilized for efficient deployment as a free byproduct.

In the future, the researchers want to integrate TLT into more types of training and inference frameworks and find new reinforcement learning applications that could be accelerated using this approach.

“As reasoning continues to become the major workload driving the demand for inference, Qinghao’s TLT is great work to cope with the computation bottleneck of training these reasoning models. I think this method will be very helpful in the context of efficient AI computing,” Han says.

This work is funded by the MIT-IBM Watson AI Lab, the MIT AI Hardware Program, the MIT Amazon Science Hub, Hyundai Motor Company, and the National Science Foundation.



Source_link

READ ALSO

Why Experts Are Suddenly Worried About AI Going Rogue

MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2

Related Posts

Why Experts Are Suddenly Worried About AI Going Rogue
Al, Analytics and Automation

Why Experts Are Suddenly Worried About AI Going Rogue

April 12, 2026
MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2
Al, Analytics and Automation

MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2

April 12, 2026
Al, Analytics and Automation

Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput

April 12, 2026
Al, Analytics and Automation

How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model

April 11, 2026
Washington Is Getting Ready to Slow AI Down. And This Has Nothing to Do with Politics
Al, Analytics and Automation

Washington Is Getting Ready to Slow AI Down. And This Has Nothing to Do with Politics

April 11, 2026
Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts
Al, Analytics and Automation

Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts

April 11, 2026
Next Post
8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Google AI Mode is starting to show ads in search results

Google AI Mode is starting to show ads in search results

November 21, 2025
Vector Databases vs. Graph RAG for Agent Memory: When to Use Which

Vector Databases vs. Graph RAG for Agent Memory: When to Use Which

March 7, 2026
Ensuring a safer online experience for U.S. kids and teens

Ensuring a safer online experience for U.S. kids and teens

July 31, 2025
A Framework for Best Answer Marketing – TopRank® Marketing

A Framework for Best Answer Marketing – TopRank® Marketing

June 11, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Omnichannel Banking Trends to Watch in 2026
  • Branding for Apparel Startups: Building a Visual Identity That Sells
  • Why Is It So Hard to Fix an Electric Bike? (2026)
  • QR codes for small business
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions