• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Friday, January 23, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Meta AI Releases Omnilingual ASR: A Suite of Open-Source Multilingual Speech Recognition Models for 1600+ Languages

Josh by Josh
November 11, 2025
in Al, Analytics and Automation
0
Meta AI Releases Omnilingual ASR: A Suite of Open-Source Multilingual Speech Recognition Models for 1600+ Languages
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter


How do you build a single speech recognition system that can understand 1,000’s of languages including many that never had working ASR (automatic speech recognition) models before? Meta AI has released Omnilingual ASR, an open source speech recognition suite that scales to more than 1,600 languages and can be extended to unseen languages with only a few speech text examples, without retraining the model.

Data and language coverage

The supervised training data comes from a combined corpus called AllASR. AllASR contains 120,710 hours of labeled speech paired with transcripts across 1,690 languages. This corpus merges several sources, including open source datasets, internal and licensed corpora, partner created data, and a commissioned collection called the Omnilingual ASR Corpus.

The Omnilingual ASR Corpus contributes 3,350 hours of speech for 348 languages, with data collected through field work with local organizations and speakers in regions such as Africa and South Asia. Prompts are open ended, so speakers produce natural monologues in their own language instead of reading fixed sentences, which gives more realistic acoustic and lexical variation.

https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/

For self supervised pre training, the wav2vec 2.0 encoders are trained on a large unlabeled speech corpus. The pre training dataset contains 3.84M hours of speech with language identification across 1,239 languages, plus another 460K hours without language identification. The total unlabeled audio used for pre training is therefore about 4.3M hours. This is still significantly smaller than the 12M hours used by USM, which makes the reported results more interesting from a data efficiency perspective.

https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/

Model family

Omnilingual ASR exposes 3 main model families that all share the same wav2vec 2.0 speech encoder backbone:

  1. SSL encoders (OmniASR W2V)
    Self supervised wav2vec 2.0 encoders with the following parameter counts
    • omniASR_W2V_300M with 317,390,592 parameters
    • omniASR_W2V_1B with 965,514,752 parameters
    • omniASR_W2V_3B with 3,064,124,672 parameters
    • omniASR_W2V_7B with 6,488,487,168 parameters. These models are trained with the standard wav2vec 2.0 contrastive objective. After training, the quantizer is discarded and the encoder is used as a speech representation backbone.
  2. CTC (connectionist temporal classification) ASR models
    CTC models add a simple linear layer on top of the encoder and train end to end with a character level CTC loss. The released CTC models range from 325,494,996 parameters to 6,504,786,132 parameters and reach real time factors as low as 0.001 for the 300M model on A100 for 30 second audio with batch size 1.
  3. LLM ASR models
    LLM ASR stacks a Transformer decoder on top of the wav2vec 2.0 encoder. The decoder is a language model like Transformer that operates on character level tokens plus special tokens such as <BOS> and <EOS>. Training uses standard next token prediction on sequences of the form gs(x), gt(<BOS>), gt(y), gt(<EOS>) where gs is the speech encoder and gt is the text embedding matrix. The LLM ASR family ranges from about 1.63B parameters for omniASR_LLM_300M to 7,801,041,536 parameters for omniASR_LLM_7B. A separate omniASR_LLM_7B_ZS checkpoint with 7,810,900,608 parameters is used for zero shot ASR.

All LLM ASR models support optional language conditioning. Languages are represented as {language_code}_{script} such as eng_Latn for English in Latin script or cmn_Hans for Mandarin Chinese in Simplified Chinese script. A learned embedding for the language script identifier is injected into the decoder input. In training, the language ID token is sometimes dropped, so the model can also operate without explicit language tags at inference.

Zero shot ASR with context examples and SONAR

The supervised models cover more than 1,600 languages. However, many languages still have no transcribed ASR data. To handle these cases, Omnilingual ASR extends the LLM ASR model with a zero shot mode trained with context examples.

During training for the zero shot variant, the decoder consumes N + 1 speech text pairs from the same language. The first N pairs act as context and the final pair is the target. All pairs are embedded with the speech encoder and text embedding matrix, then concatenated into a single decoder input sequence. The loss is still next token prediction on the target transcription. This teaches the decoder to infer the mapping from speech to text in a given language from a small prompt of in language examples.

At inference, the omniASR_LLM_7B_ZS model can receive a few speech text examples from any language, including languages not present in training, and then transcribe new utterances in that language without updating weights. This is in context learning for ASR.

The system includes an example retrieval mechanism based on SONAR, a multilingual multimodal encoder that projects audio and text into a shared embedding space. The target audio is embedded once, then nearest neighbor search over a database of speech text pairs selects the most relevant examples to include in the context window. This SONAR based selection improves zero shot performance compared with random example selection or simple text similarity.

https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/

Quality and benchmarks

The omniASR_LLM_7B model achieves character error rate below 10 percent for 78 percent of the more than 1,600 supported languages.

The research team reports that on multilingual benchmarks such as FLEURS 102, the 7B LLM ASR model outperforms the 7B CTC models and also surpasses Google USM variants in average character error rate, despite using about 4.3M unlabeled hours instead of 12M and a simpler pre training pipeline. This suggests that scaling the wav2vec 2.0 encoder and adding an LLM style decoder is an effective path for high coverage multilingual ASR.

Key Takeaways

  1. Omnilingual ASR provides open source ASR coverage for more than 1,600 languages and can generalize to more than 5,400 languages using zero shot in context learning.
  2. The models are built on large scale wav2vec 2.0 encoders trained on about 4.3M hours of unlabeled audio from 1,239 labeled languages plus additional unlabeled speech.
  3. The suite includes wav2vec 2.0 encoders, CTC ASR, LLM ASR, and a dedicated zero shot LLM ASR model, with encoder sizes from 300M to 7B parameters and LLM ASR up to about 7.8B parameters.
  4. The 7B LLM ASR model achieves character error rate below 10 percent on 78 percent of the more than 1,600 supported languages, which is competitive with or better than prior multilingual systems in low resource settings.

Omnilingual ASR is a significant systems level contribution because it treats multilingual ASR as an extensible framework, not a fixed language list, combining a 7B wav2vec 2.0 encoder, CTC and LLM ASR decoders, and a zero shot LLM ASR model that can adapt to new languages with a few in context examples, while achieving character error rate below 10 percent on 78 percent of more than 1,600 supported languages and releasing everything under Apache 2.0 and CC BY 4.0. Overall, this launch establishes Omnilingual ASR as the most extensible open source speech recognition model currently available.


Check out the Paper, Repo and Technical details. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.



Source_link

READ ALSO

Joi Chatbot Access, Pricing, and Feature Overview

Qwen Researchers Release Qwen3-TTS: an Open Multilingual TTS Suite with Real-Time Latency and Fine-Grained Voice Control

Related Posts

Joi Chatbot Access, Pricing, and Feature Overview
Al, Analytics and Automation

Joi Chatbot Access, Pricing, and Feature Overview

January 23, 2026
Qwen Researchers Release Qwen3-TTS: an Open Multilingual TTS Suite with Real-Time Latency and Fine-Grained Voice Control
Al, Analytics and Automation

Qwen Researchers Release Qwen3-TTS: an Open Multilingual TTS Suite with Real-Time Latency and Fine-Grained Voice Control

January 23, 2026
Quality Data Annotation for Cardiovascular AI
Al, Analytics and Automation

Quality Data Annotation for Cardiovascular AI

January 23, 2026
A Missed Forecast, Frayed Nerves and a Long Trip Back
Al, Analytics and Automation

A Missed Forecast, Frayed Nerves and a Long Trip Back

January 23, 2026
Microsoft Releases VibeVoice-ASR: A Unified Speech-to-Text Model Designed to Handle 60-Minute Long-Form Audio in a Single Pass
Al, Analytics and Automation

Microsoft Releases VibeVoice-ASR: A Unified Speech-to-Text Model Designed to Handle 60-Minute Long-Form Audio in a Single Pass

January 23, 2026
Slow Down the Machines? Wall Street and Silicon Valley at Odds Over A.I.’s Nearest Future
Al, Analytics and Automation

Slow Down the Machines? Wall Street and Silicon Valley at Odds Over A.I.’s Nearest Future

January 22, 2026
Next Post
The 12 best retro gaming gifts for the 2025 holidays

The 12 best retro gaming gifts for the 2025 holidays

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Google announced the next step in its nuclear energy plans 

Google announced the next step in its nuclear energy plans 

August 20, 2025

EDITOR'S PICK

41 Tips to Optimize Your Website

41 Tips to Optimize Your Website

November 7, 2025
Are You Ready for Some Football…. or Mega Ads? – Brookline PR

Are You Ready for Some Football…. or Mega Ads? – Brookline PR

June 5, 2025
The best Amazon deals for Prime Day include two Blink Mini 2 cameras for $35

The best Amazon deals for Prime Day include two Blink Mini 2 cameras for $35

October 4, 2025

800.com Launches Integrations with Google, Meta, and Bing Ads to Power Smarter Marketing Attribution

July 29, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • US Judge Rules ICE Raids Require Judicial Warrants, Contradicting Secret ICE Memo
  • TikTok Oracle Deal: What the New US Entity Means for Advertisers in 2026
  • If You’ve Been Investing in SEO, You’re on the Right Track With GEO: An AMA With Lily Ray
  • B2B Attendees Demand More Value at Events—8 Brands Weigh In
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?