• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, November 13, 2025
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

NVIDIA Just Released Audio Flamingo 3: An Open-Source Model Advancing Audio General Intelligence

Josh by Josh
July 16, 2025
in Al, Analytics and Automation
0
NVIDIA Just Released Audio Flamingo 3: An Open-Source Model Advancing Audio General Intelligence
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Heard about Artificial General Intelligence (AGI)? Meet its auditory counterpart—Audio General Intelligence. With Audio Flamingo 3 (AF3), NVIDIA introduces a major leap in how machines understand and reason about sound. While past models could transcribe speech or classify audio clips, they lacked the ability to interpret audio in a context-rich, human-like way—across speech, ambient sound, and music, and over extended durations. AF3 changes that.

With Audio Flamingo 3, NVIDIA introduces a fully open-source large audio-language model (LALM) that not only hears but also understands and reasons. Built on a five-stage curriculum and powered by the AF-Whisper encoder, AF3 supports long audio inputs (up to 10 minutes), multi-turn multi-audio chat, on-demand thinking, and even voice-to-voice interactions. This sets a new bar for how AI systems interact with sound, bringing us a step closer to AGI.

READ ALSO

Building ReAct Agents with LangGraph: A Beginner’s Guide

Top 8 3D Point Cloud Annotation Companies in 2026

The Core Innovations Behind Audio Flamingo 3

  1. AF-Whisper: A Unified Audio Encoder AF3 uses AF-Whisper, a novel encoder adapted from Whisper-v3. It processes speech, ambient sounds, and music using the same architecture—solving a major limitation of earlier LALMs which used separate encoders, leading to inconsistencies. AF-Whisper leverages audio-caption datasets, synthesized metadata, and a dense 1280-dimension embedding space to align with text representations.
  2. Chain-of-Thought for Audio: On-Demand Reasoning Unlike static QA systems, AF3 is equipped with ‘thinking’ capabilities. Using the AF-Think dataset (250k examples), the model can perform chain-of-thought reasoning when prompted, enabling it to explain its inference steps before arriving at an answer—a key step toward transparent audio AI.
  3. Multi-Turn, Multi-Audio Conversations Through the AF-Chat dataset (75k dialogues), AF3 can hold contextual conversations involving multiple audio inputs across turns. This mimics real-world interactions, where humans refer back to previous audio cues. It also introduces voice-to-voice conversations using a streaming text-to-speech module.
  4. Long Audio Reasoning AF3 is the first fully open model capable of reasoning over audio inputs up to 10 minutes. Trained with LongAudio-XL (1.25M examples), the model supports tasks like meeting summarization, podcast understanding, sarcasm detection, and temporal grounding.

State-of-the-Art Benchmarks and Real-World Capability

AF3 surpasses both open and closed models on over 20 benchmarks, including:

  • MMAU (avg): 73.14% (+2.14% over Qwen2.5-O)
  • LongAudioBench: 68.6 (GPT-4o evaluation), beating Gemini 2.5 Pro
  • LibriSpeech (ASR): 1.57% WER, outperforming Phi-4-mm
  • ClothoAQA: 91.1% (vs. 89.2% from Qwen2.5-O)

These improvements aren’t just marginal; they redefine what’s expected from audio-language systems. AF3 also introduces benchmarking in voice chat and speech generation, achieving 5.94s generation latency (vs. 14.62s for Qwen2.5) and better similarity scores.

The Data Pipeline: Datasets That Teach Audio Reasoning

NVIDIA didn’t just scale compute—they rethought the data:

  • AudioSkills-XL: 8M examples combining ambient, music, and speech reasoning.
  • LongAudio-XL: Covers long-form speech from audiobooks, podcasts, meetings.
  • AF-Think: Promotes short CoT-style inference.
  • AF-Chat: Designed for multi-turn, multi-audio conversations.

Each dataset is fully open-sourced, along with training code and recipes, enabling reproducibility and future research.

Open Source

AF3 is not just a model drop. NVIDIA released:

  • Model weights
  • Training recipes
  • Inference code
  • Four open datasets

This transparency makes AF3 the most accessible state-of-the-art audio-language model. It opens new research directions in auditory reasoning, low-latency audio agents, music comprehension, and multi-modal interaction.

Conclusion: Toward General Audio Intelligence

Audio Flamingo 3 demonstrates that deep audio understanding is not just possible but reproducible and open. By combining scale, novel training strategies, and diverse data, NVIDIA delivers a model that listens, understands, and reasons in ways previous LALMs could not.


Check out the Paper, Codes and Model on Hugging Face. All credit for this research goes to the researchers of this project.

Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Research, and top AI companies leverage MarkTechPost to reach their target audience [Learn More]


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

Building ReAct Agents with LangGraph: A Beginner’s Guide
Al, Analytics and Automation

Building ReAct Agents with LangGraph: A Beginner’s Guide

November 13, 2025
Top 8 3D Point Cloud Annotation Companies in 2026
Al, Analytics and Automation

Top 8 3D Point Cloud Annotation Companies in 2026

November 13, 2025
Talk to Your TV — Bitmovin’s Agentic AI Hub Quietly Redefines How We Watch
Al, Analytics and Automation

Talk to Your TV — Bitmovin’s Agentic AI Hub Quietly Redefines How We Watch

November 13, 2025
How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers
Al, Analytics and Automation

How to Build a Fully Functional Custom GPT-style Conversational AI Locally Using Hugging Face Transformers

November 13, 2025
Datasets for Training a Language Model
Al, Analytics and Automation

Datasets for Training a Language Model

November 13, 2025
PR Newswire via Morningstar PR Newswire Introduces AI-Led Platform Redefining the Future of Public Relations
Al, Analytics and Automation

PR Newswire via Morningstar PR Newswire Introduces AI-Led Platform Redefining the Future of Public Relations

November 12, 2025
Next Post
Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems

Google study shows LLMs abandon correct answers under pressure, threatening multi-turn AI systems

POPULAR NEWS

Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
7 Best EOR Platforms for Software Companies in 2025

7 Best EOR Platforms for Software Companies in 2025

June 21, 2025

EDITOR'S PICK

PRDecoded convenes PR pros at a pivotal time for the industry

PRDecoded convenes PR pros at a pivotal time for the industry

October 10, 2025
The critical role of Data in your Event & AI strategy | Marketing Cube

The critical role of Data in your Event & AI strategy | Marketing Cube

August 19, 2025
“Reflection Point” sculpture in New York uses Google’s AI products

“Reflection Point” sculpture in New York uses Google’s AI products

June 8, 2025
How G2 Remains Committed to Authentic Reviews in an Age of AI

How G2 Remains Committed to Authentic Reviews in an Age of AI

June 17, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Full list of winners: The inaugural Zenith Awards
  • Who is Johnson Wen? The Ariana Grande Stage Invader
  • Offload Patterns for East–West Traffic
  • Building ReAct Agents with LangGraph: A Beginner’s Guide
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?