• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Thursday, May 7, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets

Josh by Josh
May 7, 2026
in Al, Analytics and Automation
0
Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets


Evaluating AI models trained on brain signals has long been a messy, inconsistent topic. Different research groups use different preprocessing pipelines, train models on different datasets, and report results on a narrow set of tasks — making it nearly impossible to know which model actually works best, or for what. A new framework from Meta AI team is designed to fix that.

Meta Researchers have released NeuralBench, a unified, open-source framework for benchmarking AI models of brain activity. Its first release, NeuralBench-EEG v1.0, is the largest open benchmark of its kind: 36 downstream tasks, 94 datasets, 9,478 subjects, 13,603 hours of electroencephalography (EEG) data, and 14 deep learning architectures evaluated under a single standardized interface.

READ ALSO

Study: Firms often use automation to control certain workers’ wages | MIT News

A Groq-Powered Agentic Research Assistant with LangGraph, Tool Calling, Sub-Agents, and Agentic Memory: Lets Built It

https://ai.meta.com/research/publications/neuralbench-a-unifying-framework-to-benchmark-neuroai-models/

The Problem NeuralBench Solves

The broader field of NeuroAI where deep learning meets neuroscience has exploded in recent years. Self-supervised learning techniques originally developed for language, speech and images are now being adapted to build brain foundation models: large models pretrained on unlabeled brain recordings and fine-tuned for downstream tasks ranging from clinical seizure detection to decoding what a person is seeing or hearing.

But the evaluation landscape has been badly fragmented. Existing benchmarks like MOABB cover up to 148 brain-computer interfacing (BCI) datasets but limit evaluation to just 5 downstream tasks. Other efforts — EEG-Bench, EEG-FM-Bench, AdaBrain-Bench — are each constrained in their own ways. For modalities like magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI), there is no systematic benchmark at all.

The result — claims about foundation models being “generalizable” or “foundational” often rest on cherry-picked tasks with no common reference point.

What is NeuralBench?

NeuralBench is built on three core Python packages that form a modular pipeline.

NeuralFetch handles dataset acquisition, pulling curated data from public repositories including OpenNeuro, DANDI, and NEMAR. NeuralSet prepares data as PyTorch-ready dataloaders, wrapping existing neuroscience tools like MNE-Python and nilearn for preprocessing, and HuggingFace for extracting stimulus embeddings (for tasks involving images, speech, or text). NeuralTrain provides modular training code built on PyTorch-Lightning, Pydantic, and the exca execution and caching library.

Once installed via pip install neuralbench, the framework is controlled via a command-line interface (CLI). Running a task is as simple as three commands: download the data, prepare the cache, and execute. Every task is configured through a lightweight YAML file that specifies the data source, train/validation/test splits, preprocessing steps, target processing, training hyperparameters, and evaluation metrics.

https://ai.meta.com/research/publications/neuralbench-a-unifying-framework-to-benchmark-neuroai-models/

What NeuralBench-EEG v1.0 Covers

The first release focuses on EEG and spans eight task categories: cognitive decoding (image, sentence, speech, typing, video, and word decoding), brain-computer interfacing (BCI), evoked responses, clinical tasks, internal state, sleep, phenotyping, and miscellaneous.

Three classes of models are compared:

  • Task-specific architectures (~1.5K–4.2M parameters, trained from scratch): ShallowFBCSPNet, Deep4Net, EEGNet, BDTCN, ATCNet, EEGConformer, SimpleConvTimeAgg, and CTNet.
  • EEG foundation models (~3.2M–157.1M parameters, pretrained and fine-tuned): BENDR, LaBraM, BIOT, CBraMod, LUNA, and REVE.
  • Handcrafted feature baselines: sklearn-style pipelines using symmetric positive definite (SPD) matrix representations fed into logistic or Ridge regression.

All foundation models are fine-tuned end-to-end using a shared training recipe — AdamW optimizer, learning rate of 10⁻⁴, weight decay of 0.05, cosine-annealing with 10% warmup, up to 50 epochs with early stopping (patience=10). The sole exception is BENDR, for which the learning rate is lowered to 10⁻⁵ and gradient clipping is applied at 0.5 to obtain stable learning curves. This intentional standardization otherwise removes model-specific optimization tricks — such as layer-wise learning rate decay, two-stage probing, or LoRA — so that architecture and pretraining methodology are what actually gets evaluated.

Data splitting is handled differently per task type to reflect real-world generalization constraints: predefined splits where provided by dataset research team, leave-concept-out for cognitive decoding tasks (all subjects seen in training, but a held-out set of stimuli used for testing), cross-subject splits for most clinical and BCI tasks, and within-subject splits for datasets with very few participants. Each model is trained three times per task using three different random seeds.

Evaluation metrics are standardized by task type: balanced accuracy for binary and multiclass classification, macro F1-score for multilabel classification, Pearson correlation for regression, and top-5 accuracy for retrieval tasks. All results are additionally reported as normalized scores (s̃), where 0 corresponds to dummy-level performance and 1 corresponds to perfect performance, enabling fair cross-task comparisons regardless of metric scale.

One important methodological note: some EEG foundation models were pretrained on datasets that overlap with NeuralBench’s downstream evaluation sets. Rather than discarding these results, the benchmark flags them with hashed bars in result figures so readers can identify potential pretraining data leakage — no strong trend suggesting leakage inflates performance was observed, but the transparency is preserved.

The benchmark offers two variants: NeuralBench-EEG-Core v1.0, which uses a single representative dataset per task for broad coverage, and NeuralBench-EEG-Full v1.0, which expands to up to 24 datasets per task to study within-task variability across recording hardware, labs, and subject populations. A Kendall’s τ of 0.926 (p < 0.001) between Core and Full rankings confirms that the Core variant is a reliable proxy — though a few model positions do shift, including CTNet overtaking LUNA when more datasets are included.

https://ai.meta.com/research/publications/neuralbench-a-unifying-framework-to-benchmark-neuroai-models/

Two Key Findings

Finding 1: Foundation models only marginally outperform task-specific models. The top-ranked models overall are REVE (69.2M parameters, mean normalized rank 0.20), LaBraM (5.8M, rank 0.21), and LUNA (40.4M, rank 0.30). But several task-specific models trained from scratch — CTNet (150K parameters, rank 0.32), SimpleConvTimeAgg (4.2M, rank 0.35), and Deep4Net (146K, rank 0.43) — trail closely behind. CTNet actually overtakes the LUNA foundation model to rank third in the Full variant, despite having roughly 270× fewer parameters. This shows the gap between task-specific and foundation models is narrow enough that expanding dataset coverage alone is sufficient to change global rankings.

Finding 2: Many tasks remain genuinely hard. Cognitive decoding tasks — recovering dense representations of images, speech, sentences, video, or words from brain activity — are particularly challenging, with even the best models scoring well below ceiling. Tasks like mental imagery, sleep arousal, psychopathology decoding, and cross-subject motor imagery and P300 classification frequently yield performance close to dummy level. These tasks represent the best benchmarks for stress-testing the next generation of EEG foundation models.

Tasks approaching saturation include SSVEP classification, pathology detection, seizure detection, sleep stage classification, and phenotyping tasks like age regression and sex classification.

Beyond EEG: MEG and fMRI

Even in this initial EEG-focused release, NeuralBench already supports MEG and fMRI tasks as proof of concept. Notably, the REVE model — pretrained exclusively on EEG data — achieves the best performance among all tested models on the typing decoding task in MEG. This is a striking early signal that EEG-pretrained representations may transfer meaningfully across brain recording modalities, a hypothesis the framework is positioned to rigorously test in future releases.

The infrastructure is explicitly designed for expansion to intracranial EEG (iEEG), functional near-infrared spectroscopy (fNIRS), and electromyography (EMG).

How to Get Started

Installation takes a single command: pip install neuralbench. From there, running the audiovisual stimulus classification task on EEG looks like this:

neuralbench eeg audiovisual_stimulus --download   # Download data
neuralbench eeg audiovisual_stimulus --prepare    # Prepare cache
neuralbench eeg audiovisual_stimulus              # Run the task

To run all 36 tasks against all 14 EEG models, the -m all_classic all_fm flag handles the orchestration. Full benchmark storage requirements are substantial: approximately 11 TB total (~3.2 TB raw data, ~7.8 TB preprocessed cache, ~333 GB logged results), with one GPU of at least 32 GB VRAM per job — though average peak GPU usage measured across experiments is only ~1.3 GB (maximum ~30.3 GB).

The full NeuralBench-EEG-Full v1.0 run requires approximately 1,751 GPU-hours across 4,947 experiments.

Key Takeaways

  • Meta AI’s NeuralBench-EEG v1.0 is an open EEG benchmark — 36 tasks, 94 datasets, 9,478 subjects, and 14 deep learning architectures under one standardized interface.
  • Despite up to 270× more parameters, EEG foundation models like REVE only marginally outperform lightweight task-specific models like CTNet (150K params) across the benchmark.
  • Cognitive decoding tasks (speech, video, sentence, word decoding from brain activity) and clinical predictions remain highly challenging, with most models scoring near dummy level.
  • REVE, pretrained only on EEG data, outperformed all models on MEG typing decoding — an early signal of meaningful cross-modality transfer.
  • NeuralBench is MIT-licensed.

Check out the Paper and GitHub Repo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us




Source_link

Related Posts

Study: Firms often use automation to control certain workers’ wages | MIT News
Al, Analytics and Automation

Study: Firms often use automation to control certain workers’ wages | MIT News

May 7, 2026
A Groq-Powered Agentic Research Assistant with LangGraph, Tool Calling, Sub-Agents, and Agentic Memory: Lets Built It
Al, Analytics and Automation

A Groq-Powered Agentic Research Assistant with LangGraph, Tool Calling, Sub-Agents, and Agentic Memory: Lets Built It

May 7, 2026
Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss
Al, Analytics and Automation

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

May 6, 2026
Medical Imaging Data Annotation for AI: Choosing the Right Partner
Al, Analytics and Automation

Medical Imaging Data Annotation for AI: Choosing the Right Partner

May 6, 2026
U.S. Officials Want Early Access to Advanced AI, and the Big Companies Have Agreed
Al, Analytics and Automation

U.S. Officials Want Early Access to Advanced AI, and the Big Companies Have Agreed

May 6, 2026
Games people — and machines — play: Untangling strategic reasoning to advance AI | MIT News
Al, Analytics and Automation

Games people — and machines — play: Untangling strategic reasoning to advance AI | MIT News

May 6, 2026
Next Post
How to Disable Google’s Gemini in Chrome

How to Disable Google's Gemini in Chrome

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

The Modern Leader series: Lead change with heart, not just plans

October 14, 2025
Remarkable fundraising: Superlatives & extremes

Remarkable fundraising: Superlatives & extremes

June 15, 2025

The Scoop: NFL and Blackstone reassure employees after tragic NYC shooting

July 30, 2025
Enduring Brands Thrive On Purpose, Promise, And Profitability

Enduring Brands Thrive On Purpose, Promise, And Profitability

October 28, 2025

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • What Actually Changed? Not As Much as You Think
  • ChatGPT Ads Go Self-Serve, Purchase Retention Expands, and More
  • How to Disable Google’s Gemini in Chrome
  • Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions