• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Saturday, April 25, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation

Josh by Josh
December 17, 2025
in Al, Analytics and Automation
0
Meta AI Releases SAM Audio: A State-of-the-Art Unified Model that Uses Intuitive and Multimodal Prompts for Audio Separation


Meta has released SAM Audio, a prompt driven audio separation model that targets a common editing bottleneck, isolating one sound from a real world mix without building a custom model per sound class. Meta released 3 main sizes, sam-audio-small, sam-audio-base, and sam-audio-large. The model is available to download and to try in the Segment Anything Playground.

Architecture

SAM Audio uses separate encoders for each conditioning signal, an audio encoder for the mixture, a text encoder for the natural language description, a span encoder for time anchors, and a visual encoder that consumes a visual prompt derived from video plus an object mask. The encoded streams are concatenated into time aligned features, then processed by a diffusion transformer that applies self attention over the time aligned representation and cross attention to the textual feature, then a DACVAE decoder reconstructs waveforms and emits 2 outputs, target audio and residual audio.

READ ALSO

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates

https://ai.meta.com/blog/sam-audio/

What SAM Audio does, and what ‘segment’ means here?

SAM Audio takes an input recording that contains multiple overlapping sources, for example speech plus traffic plus music, and separates out a target source based on a prompt. In the public inference API, the model produces 2 outputs, result.target and result.residual. The research team describes target as the isolated sound, and residual as everything else.

That target plus residual interface maps directly to editor operations. If you want to remove a dog bark across a podcast track, you can treat the bark as the target, then subtract it by keeping only residual. If you want to extract a guitar part from a concert clip, you keep the target waveform instead. Meta uses these exact kinds of examples to explain what the model is meant to enable.

The 3 prompt types Meta is shipping

Meta positions SAM Audio as a single unified model that supports 3 prompt types, and it says these prompts can be used alone or combined.

  1. Text prompting: You describe the sound in natural language, for example “dog barking” or “singing voice”, and the model separates that sound from the mixture. Meta lists text prompts as one of the core interaction modes, and the open source repo includes an end to end example using SAMAudioProcessor and model.separate.
  2. Visual prompting: You click the person or object in a video and ask the model to isolate the audio associated with that visual object. Meta team describes visual prompting as selecting the sounding object in the video. In the released code path, visual prompting is implemented by passing video frames plus masks into the processor via masked_videos.
  3. Span prompting: Meta team calls span prompting an industry first. You mark time segments where the target sound occurs, then the model uses those spans to guide separation. This matters for ambiguous cases, for example when the same instrument appears in multiple passages, or when a sound is present only briefly and you want to prevent the model from over separating.
https://ai.meta.com/blog/sam-audio/

Results

Meta team positions SAM Audio as achieving cutting edge performance across diverse, real world scenarios, and frames it as a unified alternative to single purpose audio tools. The team publishes a subjective evaluation table across categories, General, SFX, Speech, Speaker, Music, Instr(wild), Instr(pro), with General scores of 3.62 for sam audio small, 3.28 for sam audio base, and 3.50 for sam audio large, and Instr(pro) scores reaching 4.49 for sam audio large.

Key Takeaways

  1. SAM Audio is a unified audio separation model, it segments sound from complex mixtures using text prompts, visual prompts, and time span prompts.
  2. The core API produces two waveforms per request, target for the isolated sound and residual for everything else, which maps cleanly to common edit operations like remove noise, extract stem, or keep ambience.
  3. Meta released multiple checkpoints and variants, including sam-audio-small, sam-audio-base, sam-audio-large, plus tv variants that the repo says perform better for visual prompting, the repo also publishes a subjective evaluation table by category.
  4. The release includes tooling beyond inference, Meta provides a sam-audio-judge model that scores separation results against a text description with overall quality, recall, precision, and faithfulness.

Check out the Technical details and GitHub Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source_link

Related Posts

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News
Al, Analytics and Automation

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News

April 24, 2026
Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates
Al, Analytics and Automation

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Training Architecture Achieving 88% Goodput Under High Hardware Failure Rates

April 24, 2026
Mend Releases AI Security Governance Framework: Covering Asset Inventory, Risk Tiering, AI Supply Chain Security, and Maturity Model
Al, Analytics and Automation

Mend Releases AI Security Governance Framework: Covering Asset Inventory, Risk Tiering, AI Supply Chain Security, and Maturity Model

April 24, 2026
“Your Next Coworker May Not Be Human” as Google Bets Everything on AI Agents to Power the Office
Al, Analytics and Automation

“Your Next Coworker May Not Be Human” as Google Bets Everything on AI Agents to Power the Office

April 23, 2026
Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures
Al, Analytics and Automation

Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures

April 23, 2026
The Most Efficient Approach to Crafting Your Personal AI Productivity System
Al, Analytics and Automation

The Most Efficient Approach to Crafting Your Personal AI Productivity System

April 23, 2026
Next Post
Gemini 3 Flash arrives with reduced costs and latency — a powerful combo for enterprises

Gemini 3 Flash arrives with reduced costs and latency — a powerful combo for enterprises

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Branding for Pinky Swear by The Working Assembly — BP&O

Branding for Pinky Swear by The Working Assembly — BP&O

September 11, 2025
How to Use Google Keyword Planner

How to Use Google Keyword Planner

August 11, 2025
The Roadmap to Mastering Agentic AI Design Patterns

The Roadmap to Mastering Agentic AI Design Patterns

April 16, 2026
Renforcez vos compétences en communication commerciale

Renforcez vos compétences en communication commerciale

June 15, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • The Link Between Your Oral Health and Your Overall Health
  • Steve Ballmer blasts founder he backed who pleaded guilty to fraud: ‘I was duped and feel silly’
  • Pipeline Insights for Enhanced Opportunity Progression
  • A Real Nigerian Prince Pitches New Vaseline Authenticator
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions