• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, March 9, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Meta AI Introduces DeepConf: First AI Method to Achieve 99.9% on AIME 2025 with Open-Source Models Using GPT-OSS-120B

Josh by Josh
August 27, 2025
in Al, Analytics and Automation
0
Meta AI Introduces DeepConf: First AI Method to Achieve 99.9% on AIME 2025 with Open-Source Models Using GPT-OSS-120B


Large language models (LLMs) have reshaped AI reasoning, with parallel thinking and self-consistency methods often cited as pivotal advances. However, these techniques face a fundamental trade-off: sampling multiple reasoning paths boosts accuracy but at a steep computational cost. A team of researchers from Meta AI and UCSD introduce Deep Think with Confidence (DeepConf), a new AI approachthat nearly eliminates this trade-off. DeepConf delivers state-of-the-art reasoning performance with dramatic efficiency gains—achieving, for example, 99.9% accuracy on the grueling AIME 2025 math competition using the open-source GPT-OSS-120B, while requiring up to 85% fewer generated tokens than conventional parallel thinking approaches.

Why DeepConf?

Parallel thinking (self-consistency with majority voting) is the de facto standard for boosting LLM reasoning: generate multiple candidate solutions, then pick the most common answer. While effective, this method has diminishing returns—accuracy plateaus or even declines as more paths are sampled, because low-quality reasoning traces can dilute the vote. Moreover, generating hundreds or thousands of traces per query is costly, both in time and compute.

READ ALSO

Pricing Breakdown and Core Feature Overview

Improving AI models’ ability to explain their predictions | MIT News

DeepConf tackles these challenges by exploiting the LLM’s own confidence signals. Rather than treating all reasoning traces equally, it dynamically filters out low-confidence paths—either during generation (online) or afterward (offline)—using only the most reliable trajectories to inform the final answer. This strategy is model-agnostic, requires no training or hyperparameter tuning, and can be plugged into any existing model or serving framework with minimal code changes.

https://arxiv.org/pdf/2508.15260

How DeepConf Works: Confidence as a Guide

DeepConf introduces several advancements in how confidence is measured and used:

  • Token Confidence: For each generated token, compute the negative average log-probability of the top-k candidates. This gives a local measure of certainty.
  • Group Confidence: Average token confidence over a sliding window (e.g., 2048 tokens), providing a smoothed, intermediate signal of reasoning quality.
  • Tail Confidence: Focus on the final segment of the reasoning trace, where the answer often resides, to catch late breakdowns.
  • Lowest Group Confidence: Identify the least confident segment in the trace, which often signals reasoning collapse.
  • Bottom Percentile Confidence: Highlight the worst segments, which are most predictive of errors.

These metrics are then used to weight votes (high-confidence traces count more) or to filter traces (only the top η% most confident traces are kept). In online mode, DeepConf stops generating a trace as soon as its confidence drops below a dynamically calibrated threshold, dramatically reducing wasted computation.

https://arxiv.org/pdf/2508.15260

Key Results: Performance & Efficiency

DeepConf was evaluated across multiple reasoning benchmarks (AIME 2024/2025, HMMT 2025, BRUMO25, GPQA-Diamond) and models (DeepSeek-8B, Qwen3-8B/32B, GPT-OSS-20B/120B). The results are striking:

Model Dataset Pass@1 Acc Cons@512 Acc DeepConf@512 Acc Tokens Saved
GPT-OSS-120B AIME 2025 91.8% 97.0% 99.9% -84.7%
DeepSeek-8B AIME 2024 83.0% 86.7% 93.3% -77.9%
Qwen3-32B AIME 2024 80.6% 85.3% 90.8% -56.0%

Performance boost: Across models and datasets, DeepConf improves accuracy by up to ~10 percentage points over standard majority voting, often saturating the benchmark’s upper limit.

Ultra-efficient: By early-stopping low-confidence traces, DeepConf reduces the total number of generated tokens by 43–85%, with no loss (and often a gain) in final accuracy.

Plug & play: DeepConf works out of the box with any model—no fine-tuning, no hyperparameter search, and no changes to the underlying architecture. You can drop it into your existing serving stack (e.g., vLLM) with ~50 lines of code.

Easy to deploy: The method is implemented as a lightweight extension to existing inference engines, requiring only access to token-level logprobs and a few lines of logic for confidence calculation and early stopping.

Simple Integration: Minimal Code, Maximum Impact

DeepConf’s implementation is quite simple. For vLLM, the changes are minimal:

  • Extend the logprobs processor to track sliding-window confidence.
  • Add an early-stop check before emitting each output.
  • Pass confidence thresholds via the API, with no model retraining.

This allows any OpenAI-compatible endpoint to support DeepConf with a single extra setting, making it trivial to adopt in production environments.

Conclusion

Meta AI’s DeepConf represents a leap forward in LLM reasoning, delivering both peak accuracy and unprecedented efficiency. By dynamically leveraging the model’s internal confidence, DeepConf achieves what was previously out of reach for open-source models: near-perfect results on elite reasoning tasks, with a fraction of the computational cost.


FAQs

FAQ 1: How does DeepConf improve accuracy and efficiency compared to majority voting?

DeepConf’s confidence-aware filtering and voting prioritizes traces with higher model certainty, boosting accuracy by up to 10 percentage points across reasoning benchmarks compared to majority voting alone. At the same time, its early termination of low-confidence traces slashes token usage by up to 85%, offering both performance and massive efficiency gains in practical deployments

FAQ 2: Can DeepConf be used with any language model or serving framework?

Yes. DeepConf is fully model-agnostic and can be integrated into any serving stack—including open-source and commercial models—without modification or retraining. Deployment requires only minimal changes (~50 lines of code for vLLM), leveraging token logprobs to compute confidence and handle early stopping.

FAQ 2: Does DeepConf require retraining, special data, or complex tuning?

No. DeepConf operates entirely at inference-time, requiring no additional model training, fine-tuning, or hyperparameter searches. It uses only built-in logprob outputs and works immediately with standard API settings for leading frameworks; it’s scalable, robust, and deployable on real workloads without interruption.


Check out the Paper and Project Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

Pricing Breakdown and Core Feature Overview
Al, Analytics and Automation

Pricing Breakdown and Core Feature Overview

March 9, 2026
Improving AI models’ ability to explain their predictions | MIT News
Al, Analytics and Automation

Improving AI models’ ability to explain their predictions | MIT News

March 9, 2026
Beyond Accuracy: Quantifying the Production Fragility Caused by Excessive, Redundant, and Low-Signal Features in Regression
Al, Analytics and Automation

Beyond Accuracy: Quantifying the Production Fragility Caused by Excessive, Redundant, and Low-Signal Features in Regression

March 9, 2026
Build Semantic Search with LLM Embeddings
Al, Analytics and Automation

Build Semantic Search with LLM Embeddings

March 8, 2026
PovChat Chatbot App Access, Costs, and Feature Insights
Al, Analytics and Automation

PovChat Chatbot App Access, Costs, and Feature Insights

March 8, 2026
Building Next-Gen Agentic AI: A Complete Framework for Cognitive Blueprint Driven Runtime Agents with Memory Tools and Validation
Al, Analytics and Automation

Building Next-Gen Agentic AI: A Complete Framework for Cognitive Blueprint Driven Runtime Agents with Memory Tools and Validation

March 8, 2026
Next Post

Vaudit Raises $7.3 Million from Adtech Veterans to Launch AI-powered Auditing Platform for Digital Ad Spend

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Our Team Built 17 Improvements to Buffer This Week, Here’s The Recap

Our Team Built 17 Improvements to Buffer This Week, Here’s The Recap

December 20, 2025
Aim For Market Dominance – Branding Strategy Insider

Aim For Market Dominance – Branding Strategy Insider

June 30, 2025

Why You Shouldn’t Use AI for Logo Design (And How to Use It the Right Way Instead)

January 22, 2026
Coalition of Canada’s credit unions extends exclusive partnership with Collabria Financial through 2030

Coalition of Canada’s credit unions extends exclusive partnership with Collabria Financial through 2030

December 15, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • How to Defeat the Noxian Invaders Attacking Terbisia in Demacia Rising in League of Legends
  • Pricing Breakdown and Core Feature Overview
  • How to Choose the Right AI Development Partner (Enterprise Checklist)
  • Dynamic UI for dynamic AI: Inside the emerging A2UI model
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions