• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Wednesday, June 17, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models

Josh by Josh
March 17, 2026
in Al, Analytics and Automation
0


Speech technology still has a data distribution problem. Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems have improved rapidly for high-resource languages, but many African languages remain poorly represented in open corpora. A team of researchers from Google and other collaborators introduce WAXAL, an open multilingual speech dataset for African languages covering 24 languages, with an ASR component built from transcribed natural speech and a TTS component built from studio-quality single-speaker recordings.

WAXAL is structured as two separate resources because ASR and TTS have different data requirements. The ASR side is designed around diverse speakers, natural environments, and spontaneous language production. The TTS side is designed around controlled recording conditions, phonetically balanced scripts, and cleaner single-speaker audio suited for synthesis. That separation is technically important: a dataset that is useful for robust recognition in noisy real-world settings is usually not the same dataset that produces strong single-speaker TTS models.

READ ALSO

MIT’s Initiative for New Manufacturing builds momentum | MIT News

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

https://arxiv.org/pdf/2602.02734

How the ASR data was collected

The ASR portion of WAXAL was collected using image-prompted speech. Speakers were shown images and asked to describe what they saw in their native language, which is a more natural setup than simple prompted reading. Recordings were captured in speakers’ natural environments, each with a minimum duration of 15 seconds. The collection process also tracked metadata such as speaker age, gender, language, and recording environment. Only a subset of the full collected audio was transcribed: the research team states that the current ASR release includes transcriptions for about 10% of the total recorded audio. Those transcriptions were produced by paid local linguistic experts, using local scripts where available and English-alphabet transliteration otherwise.

This is important for anyone building multilingual ASR systems. Image-prompted speech tends to capture more natural lexical and syntactic variation than tightly scripted reading, but it also makes transcription harder and increases variation across speakers, domains, and acoustic conditions. WAXAL leans into that tradeoff rather than avoiding it. The result is not a perfectly clean benchmark dataset; it is closer to a field-collected multilingual ASR data with real variability baked in.

How the TTS data was collected

The TTS side of WAXAL was built very differently. The TTS dataset was designed for high-quality, single-speaker synthetic voices. For each target language, the research team created a phonetically balanced script of approximately 108,500 words. They contracted 72 community participants, evenly split between male and female voice actors, and recorded them in professional studio-like environments to reduce background noise and preserve audio fidelity. The target was approximately 16 hours of clean edited audio per voice actor.

This is the right design choice for synthesis. TTS models care much more about consistency in pronunciation, recording conditions, microphone quality, and speaker identity than ASR systems do. WAXAL therefore avoids the common mistake of treating ‘speech data’ as a single category, when in practice ASR and TTS pipelines want very different supervision signals.

Key Takeaways

  • WAXAL is an open multilingual speech corpus built for low-resource African language ASR and TTS.
  • The ASR data uses image-prompted, natural speech collected in real-world environments.
  • The TTS data uses studio-quality, single-speaker recordings with phonetically balanced scripts.

Check out Paper and Dataset here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.



Source_link

Related Posts

MIT’s Initiative for New Manufacturing builds momentum | MIT News
Al, Analytics and Automation

MIT’s Initiative for New Manufacturing builds momentum | MIT News

June 16, 2026
Al, Analytics and Automation

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

June 16, 2026
Building a Multi-Tool Gemma 4 Agent with Error Recovery
Al, Analytics and Automation

Building a Multi-Tool Gemma 4 Agent with Error Recovery

June 16, 2026
Sakana AI Commercializes AB-MCTS in Sakana Marlin, an Enterprise Agent Generating Up to 100-Page Research Reports With Slides
Al, Analytics and Automation

Sakana AI Commercializes AB-MCTS in Sakana Marlin, an Enterprise Agent Generating Up to 100-Page Research Reports With Slides

June 16, 2026
The Statistics of Token Selection: Logits, Temperature, and Top-P Walkthrough
Al, Analytics and Automation

The Statistics of Token Selection: Logits, Temperature, and Top-P Walkthrough

June 15, 2026
Top Financial Data Labeling Companies for Secure AI Data
Al, Analytics and Automation

Top Financial Data Labeling Companies for Secure AI Data

June 15, 2026
Next Post
Gecko Robotics lands the largest U.S. Navy robotics deal yet

Gecko Robotics lands the largest U.S. Navy robotics deal yet

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

Readpeak Turbo Charges Leadership Team with Top Talent from Pinterest, Hearst, and The Trade Desk as It Accelerates European Expansion

October 18, 2025
How to Unlock the Demacia Rising Banner in League of Legends

How to Unlock the Demacia Rising Banner in League of Legends

April 5, 2026
Nvidia's DGX Station is a desktop supercomputer that runs trillion-parameter AI models without the cloud

Nvidia's DGX Station is a desktop supercomputer that runs trillion-parameter AI models without the cloud

March 17, 2026
Android Quick Share can now work with iOS’s AirDrop

Android Quick Share can now work with iOS’s AirDrop

November 23, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Zoho Bigin vs HubSpot — Which One Is Right for Your Solo Business?
  • Responsible AI in Australia: Governance Frameworks for Leaders
  • Explore the newest features coming to your Pixel devices in the June drop
  • Website Speed & Core Web Vitals for SEO and AI Search
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions