• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, June 8, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News

Josh by Josh
April 24, 2026
in Al, Analytics and Automation
0
MIT scientists build the world’s largest collection of Olympiad-level math problems, and open it to everyone | MIT News



Every year, the countries competing in the International Mathematical Olympiad (IMO) arrive with a booklet of their best, most original problems. Those booklets get shared among delegations, then quietly disappear. No one had ever collected them systematically, cleaned them, and made them available, not for AI researchers testing the limits of mathematical reasoning, and not for the students around the world training for these competitions largely on their own.

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), King Abdullah University of Science and Technology (KAUST), and the company HUMAIN have now done exactly that.

MathNet is the largest high-quality dataset of proof-based math problems ever created. Comprising more than 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions, it is five times larger than the next-biggest dataset of its kind. The work will be presented at the International Conference on Learning Representations (ICLR) in Brazil later this month.

What makes MathNet different is not only its size, but its breadth. Previous Olympiad-level datasets draw almost exclusively from competitions in the United States and China. MathNet spans dozens of countries across six continents, covers 17 languages, includes both text- and image-based problems and solutions, and spans four decades of competition mathematics. The goal is to capture the full range of mathematical perspectives and problem-solving traditions that exist across the global math community, not just the most visible ones.

“Every country brings a booklet of its most novel and most creative problems,” says Shaden Alshammari, an MIT PhD student and lead author on the paper. “They share the booklets with each other, but no one had made the effort to collect them, clean them, and upload them online.”

Building MathNet required tracking down 1,595 PDF volumes totaling more than 25,000 pages, spanning digital documents and decades-old scans in more than a dozen languages. A significant portion of that archive came from an unlikely source: Navid Safaei, a longtime IMO community figure and co-author who had been collecting and scanning those booklets by hand since 2006. His personal archive formed much of the backbone of the dataset.

The sourcing matters as much as the scale. Where most existing math datasets pull problems from community forums like Art of Problem Solving (AoPS), MathNet draws exclusively from official national competition booklets. The solutions in those booklets are expert-written and peer-reviewed, and they often run to multiple pages, with authors walking through several approaches to the same problem. That depth gives AI models a far richer signal for learning mathematical reasoning than the shorter, informal solutions typical of community-sourced datasets. It also means the dataset is genuinely useful for students: Anyone preparing for the IMO or a national competition now has access to a centralized, searchable collection of high-quality problems and worked solutions from traditions around the world.

“I remember so many students for whom it was an individual effort. No one in their country was training them for this kind of competition,” says Alshammari, who competed in the IMO as a student herself. “We hope this gives them a centralized place with high-quality problems and solutions to learn from.”

The team has deep roots in the IMO community. Sultan Albarakati, a co-author, currently serves on the IMO board, and the researchers are working to share the dataset with the IMO foundation directly. To validate the dataset, they assembled a grading group of more than 30 human evaluators from countries including Armenia, Russia, Ukraine, Vietnam, and Poland, who coordinated together to verify thousands of solutions.

“The MathNet database has the potential to be an excellent resource for both students and leaders seeking new problems to work on or looking for the solution to a difficult question,” says Tanish Patil, deputy leader of Switzerland’s IMO. “Whilst other archives of Olympiad problems do exist (notably, the Contest Collections forums on AoPS), these resources lack standardized formatting system, verified solutions, and important problem metadata that topics and theory require. It will also be interesting to see how this dataset is used to improve the performance of reasoning models, and if we will soon be able to reliably answer an important issue when creating novel Olympiad questions: determining if a problem is truly original.”

MathNet also functions as a rigorous benchmark for AI performance, and the results reveal a more complicated picture than recent headlines about AI math prowess might suggest. Frontier models have made extraordinary progress: Some have reportedly achieved gold-medal performance at the IMO, and on standard benchmarks they now solve problems that would stump most humans. But MathNet shows that progress is uneven. Even GPT-5, the top-performing model tested, averaged around 69.3 percent on MathNet’s main benchmark of 6,400 problems, failing nearly one-in-three Olympiad-level problems. And when problems include figures, performance drops significantly across the board, exposing visual reasoning as a consistent weak point for even the most capable models.

Several open-source models scored 0 percent on Mongolian-language problems, highlighting another dimension where current AI systems fall short despite their overall strength.

“GPT models are equally good in English and other languages,” Alshammari says. “But many of the open-source models fail completely at less-common languages, such as Mongolian.”

The diversity of MathNet is also designed to address a deeper limitation in how AI models learn mathematics. When training data skews toward English and Chinese problems, models absorb a narrow slice of mathematical culture. A Romanian combinatorics problem or a Brazilian number theory problem may approach the same underlying concept from a completely different angle. Exposure to that range, the researchers argue, makes both humans and AI systems better mathematical thinkers.

Beyond problem-solving, MathNet introduces a retrieval benchmark that asks whether models can recognize when two problems share the same underlying mathematical structure, a capability that matters both for AI development and for the math community itself. Near-duplicate problems have appeared in real IMO exams over the years because finding mathematical equivalences across different notations, languages, and formats is genuinely hard, even for expert human committees. Testing eight state-of-the-art embedding models, the researchers found that even the strongest identified the correct match only about 5 percent of the time on the first try, with models frequently ranking structurally unrelated problems as more similar than equivalent ones.

The dataset also includes a retrieval-augmented generation benchmark, testing whether giving a model a structurally related problem before asking it to solve a new one improves performance. It does, but only when the retrieved problem is genuinely relevant. DeepSeek-V3.2-Speciale gained up to 12 percentage points with well-matched retrieval, while irrelevant retrieval degraded performance in roughly 22 percent of cases.

Alshammari wrote the paper with Safaei, HUMAIN AI engineer Abrar Zainal, KAUST Academy Director Sultan Albarakati, and MIT CSAIL colleagues: master’s student Kevin Wen SB ’25; Microsoft Principal Engineering Manager Mark Hamilton SM ’22, PhD ‘25; and professors William Freeman and Antonio Torralba. Their work was funded, in part, by the Schwarzman College of Computing Fellowship and the National Science Foundation.

MathNet is publicly available at mathnet.csail.mit.edu.



Source_link

READ ALSO

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

Related Posts

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset
Al, Analytics and Automation

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset

June 8, 2026
Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription
Al, Analytics and Automation

Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription

June 8, 2026
Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation
Al, Analytics and Automation

Building Reflective Prompt Optimization with GEPA: Multi-Component Prompts, Structured Feedback, and Held-Out Validation

June 7, 2026
Best 21 Low-Code and No-Code AI Tools in 2026
Al, Analytics and Automation

Best 21 Low-Code and No-Code AI Tools in 2026

June 7, 2026
Tod Machover receives George Peabody Medal for contributions to music and technology | MIT News
Al, Analytics and Automation

Tod Machover receives George Peabody Medal for contributions to music and technology | MIT News

June 6, 2026
Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents
Al, Analytics and Automation

Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents

June 6, 2026
Next Post
Top 25 SEM Tools: Content, SEO, and More!

Top 25 SEM Tools: Content, SEO, and More!

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

This Quest 3S Bundle Is $50 Off and Includes a Game and Gift Card

This Quest 3S Bundle Is $50 Off and Includes a Game and Gift Card

November 19, 2025
How Agentic AI in SaaS is Transforming Business Operations

How Agentic AI in SaaS is Transforming Business Operations

November 23, 2025

PR Daily’s 2026 Social Media & Digital Awards finalists announced

May 6, 2026
What The Prussian Army Can Teach Modern Marketers

What The Prussian Army Can Teach Modern Marketers

October 3, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Sharon Srivastava: Leading With Composure Through Presence
  • We don’t know how the Ebola outbreak started. That’s a problem.
  • ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset
  • Employee Ownership Is Not A Culture Strategy
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions