• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Tuesday, March 10, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

NVIDIA Just Released Audio Flamingo 3: An Open-Source Model Advancing Audio General Intelligence

Josh by Josh
July 16, 2025
in Al, Analytics and Automation
0
NVIDIA Just Released Audio Flamingo 3: An Open-Source Model Advancing Audio General Intelligence


Heard about Artificial General Intelligence (AGI)? Meet its auditory counterpart—Audio General Intelligence. With Audio Flamingo 3 (AF3), NVIDIA introduces a major leap in how machines understand and reason about sound. While past models could transcribe speech or classify audio clips, they lacked the ability to interpret audio in a context-rich, human-like way—across speech, ambient sound, and music, and over extended durations. AF3 changes that.

With Audio Flamingo 3, NVIDIA introduces a fully open-source large audio-language model (LALM) that not only hears but also understands and reasons. Built on a five-stage curriculum and powered by the AF-Whisper encoder, AF3 supports long audio inputs (up to 10 minutes), multi-turn multi-audio chat, on-demand thinking, and even voice-to-voice interactions. This sets a new bar for how AI systems interact with sound, bringing us a step closer to AGI.

READ ALSO

Andrew Ng’s Team Releases Context Hub: An Open Source Tool that Gives Your Coding Agent the Up-to-Date API Documentation It Needs

VirtuaLover Image Generator Pricing & Features Overview

The Core Innovations Behind Audio Flamingo 3

  1. AF-Whisper: A Unified Audio Encoder AF3 uses AF-Whisper, a novel encoder adapted from Whisper-v3. It processes speech, ambient sounds, and music using the same architecture—solving a major limitation of earlier LALMs which used separate encoders, leading to inconsistencies. AF-Whisper leverages audio-caption datasets, synthesized metadata, and a dense 1280-dimension embedding space to align with text representations.
  2. Chain-of-Thought for Audio: On-Demand Reasoning Unlike static QA systems, AF3 is equipped with ‘thinking’ capabilities. Using the AF-Think dataset (250k examples), the model can perform chain-of-thought reasoning when prompted, enabling it to explain its inference steps before arriving at an answer—a key step toward transparent audio AI.
  3. Multi-Turn, Multi-Audio Conversations Through the AF-Chat dataset (75k dialogues), AF3 can hold contextual conversations involving multiple audio inputs across turns. This mimics real-world interactions, where humans refer back to previous audio cues. It also introduces voice-to-voice conversations using a streaming text-to-speech module.
  4. Long Audio Reasoning AF3 is the first fully open model capable of reasoning over audio inputs up to 10 minutes. Trained with LongAudio-XL (1.25M examples), the model supports tasks like meeting summarization, podcast understanding, sarcasm detection, and temporal grounding.

State-of-the-Art Benchmarks and Real-World Capability

AF3 surpasses both open and closed models on over 20 benchmarks, including:

  • MMAU (avg): 73.14% (+2.14% over Qwen2.5-O)
  • LongAudioBench: 68.6 (GPT-4o evaluation), beating Gemini 2.5 Pro
  • LibriSpeech (ASR): 1.57% WER, outperforming Phi-4-mm
  • ClothoAQA: 91.1% (vs. 89.2% from Qwen2.5-O)

These improvements aren’t just marginal; they redefine what’s expected from audio-language systems. AF3 also introduces benchmarking in voice chat and speech generation, achieving 5.94s generation latency (vs. 14.62s for Qwen2.5) and better similarity scores.

The Data Pipeline: Datasets That Teach Audio Reasoning

NVIDIA didn’t just scale compute—they rethought the data:

  • AudioSkills-XL: 8M examples combining ambient, music, and speech reasoning.
  • LongAudio-XL: Covers long-form speech from audiobooks, podcasts, meetings.
  • AF-Think: Promotes short CoT-style inference.
  • AF-Chat: Designed for multi-turn, multi-audio conversations.

Each dataset is fully open-sourced, along with training code and recipes, enabling reproducibility and future research.

Open Source

AF3 is not just a model drop. NVIDIA released:

  • Model weights
  • Training recipes
  • Inference code
  • Four open datasets

This transparency makes AF3 the most accessible state-of-the-art audio-language model. It opens new research directions in auditory reasoning, low-latency audio agents, music comprehension, and multi-modal interaction.

Conclusion: Toward General Audio Intelligence

Audio Flamingo 3 demonstrates that deep audio understanding is not just possible but reproducible and open. By combining scale, novel training strategies, and diverse data, NVIDIA delivers a model that listens, understands, and reasons in ways previous LALMs could not.


Check out the Paper, Codes and Model on Hugging Face. All credit for this research goes to the researchers of this project.

Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Research, and top AI companies leverage MarkTechPost to reach their target audience [Learn More]


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

Andrew Ng’s Team Releases Context Hub: An Open Source Tool that Gives Your Coding Agent the Up-to-Date API Documentation It Needs
Al, Analytics and Automation

Andrew Ng’s Team Releases Context Hub: An Open Source Tool that Gives Your Coding Agent the Up-to-Date API Documentation It Needs

March 10, 2026
VirtuaLover Image Generator Pricing & Features Overview
Al, Analytics and Automation

VirtuaLover Image Generator Pricing & Features Overview

March 9, 2026
Al, Analytics and Automation

The ‘Bayesian’ Upgrade: Why Google AI’s New Teaching Method is the Key to LLM Reasoning

March 9, 2026
Pricing Breakdown and Core Feature Overview
Al, Analytics and Automation

Pricing Breakdown and Core Feature Overview

March 9, 2026
Improving AI models’ ability to explain their predictions | MIT News
Al, Analytics and Automation

Improving AI models’ ability to explain their predictions | MIT News

March 9, 2026
Beyond Accuracy: Quantifying the Production Fragility Caused by Excessive, Redundant, and Low-Signal Features in Regression
Al, Analytics and Automation

Beyond Accuracy: Quantifying the Production Fragility Caused by Excessive, Redundant, and Low-Signal Features in Regression

March 9, 2026
Next Post
Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems

Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

Google acquires Cameyo for virtual Windows apps on ChromeOS

Google acquires Cameyo for virtual Windows apps on ChromeOS

December 19, 2025
Instacart’s AI-driven pricing tool attracted attention — now the FTC has questions

Instacart’s AI-driven pricing tool attracted attention — now the FTC has questions

December 18, 2025
When performance reporting isn’t proof: why marketing measurement must change

When performance reporting isn’t proof: why marketing measurement must change

March 3, 2026
Stand out in search with local business schema markup

Stand out in search with local business schema markup

June 9, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Restaurant PR Playbook: Build Buzz, Launch Strong, Sustain Success
  • Why Your Home Needs Professional Network Setup
  • Andrew Ng’s Team Releases Context Hub: An Open Source Tool that Gives Your Coding Agent the Up-to-Date API Documentation It Needs
  • A Briefing from the COO
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions