• bitcoinBitcoin(BTC)$80,269.00-1.53%
  • ethereumEthereum(ETH)$2,300.20-2.47%
  • tetherTether(USDT)$1.000.01%
  • binancecoinBNB(BNB)$643.26-0.63%
  • rippleXRP(XRP)$1.39-2.06%
  • usd-coinUSDC(USDC)$1.00-0.03%
  • solanaSolana(SOL)$88.21-0.28%
  • tronTRON(TRX)$0.3488211.45%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.02-1.31%
  • dogecoinDogecoin(DOGE)$0.109030-3.38%
  • whitebitWhiteBIT Coin(WBT)$58.57-2.13%
  • USDSUSDS(USDS)$1.00-0.01%
  • HyperliquidHyperliquid(HYPE)$42.32-3.24%
  • cardanoCardano(ADA)$0.263899-0.75%
  • leo-tokenLEO Token(LEO)$10.350.00%
  • zcashZcash(ZEC)$571.63-0.55%
  • bitcoin-cashBitcoin Cash(BCH)$453.15-2.27%
  • moneroMonero(XMR)$411.99-2.11%
  • chainlinkChainlink(LINK)$9.87-1.27%
  • the-open-networkToncoin(TON)$2.446.98%
  • CantonCanton(CC)$0.146704-0.57%
  • stellarStellar(XLM)$0.159366-1.47%
  • MemeCoreMemeCore(M)$3.720.53%
  • USD1USD1(USD1)$1.00-0.01%
  • daiDai(DAI)$1.000.05%
  • litecoinLitecoin(LTC)$56.53-0.79%
  • avalanche-2Avalanche(AVAX)$9.46-1.01%
  • Ethena USDeEthena USDe(USDE)$1.000.03%
  • hedera-hashgraphHedera(HBAR)$0.090329-0.90%
  • suiSui(SUI)$0.98-0.87%
  • shiba-inuShiba Inu(SHIB)$0.000006-1.51%
  • RainRain(RAIN)$0.0074991.59%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • crypto-com-chainCronos(CRO)$0.070125-0.95%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BittensorBittensor(TAO)$306.37-1.57%
  • tether-goldTether Gold(XAUT)$4,745.131.20%
  • Global DollarGlobal Dollar(USDG)$1.000.01%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0724287.35%
  • pax-goldPAX Gold(PAXG)$4,750.771.27%
  • polkadotPolkadot(DOT)$1.310.74%
  • mantleMantle(MNT)$0.66-0.98%
  • uniswapUniswap(UNI)$3.43-0.55%
  • nearNEAR Protocol(NEAR)$1.480.74%
  • Pi NetworkPi Network(PI)$0.178023-3.60%
  • SkySky(SKY)$0.079816-0.98%
  • okbOKB(OKB)$85.84-2.22%
  • Falcon USDFalcon USD(USDF)$1.00-0.02%
  • HTX DAOHTX DAO(HTX)$0.0000020.74%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets

May 7, 2026
in AI & Technology
Reading Time: 10 mins read
A A
Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets
ShareShareShareShareShare

Evaluating AI models trained on brain signals has long been a messy, inconsistent topic. Different research groups use different preprocessing pipelines, train models on different datasets, and report results on a narrow set of tasks — making it nearly impossible to know which model actually works best, or for what. A new framework from Meta AI team is designed to fix that.

Meta Researchers have released NeuralBench, a unified, open-source framework for benchmarking AI models of brain activity. Its first release, NeuralBench-EEG v1.0, is the largest open benchmark of its kind: 36 downstream tasks, 94 datasets, 9,478 subjects, 13,603 hours of electroencephalography (EEG) data, and 14 deep learning architectures evaluated under a single standardized interface.

YOU MAY ALSO LIKE

RGG’s Stranger Than Heaven Game Arrives This Winter

Samsung Says Its Galaxy Watch Can Predict Fainting With ‘High Accuracy’

https://ai.meta.com/research/publications/neuralbench-a-unifying-framework-to-benchmark-neuroai-models/

The Problem NeuralBench Solves

The broader field of NeuroAI where deep learning meets neuroscience has exploded in recent years. Self-supervised learning techniques originally developed for language, speech and images are now being adapted to build brain foundation models: large models pretrained on unlabeled brain recordings and fine-tuned for downstream tasks ranging from clinical seizure detection to decoding what a person is seeing or hearing.

But the evaluation landscape has been badly fragmented. Existing benchmarks like MOABB cover up to 148 brain-computer interfacing (BCI) datasets but limit evaluation to just 5 downstream tasks. Other efforts — EEG-Bench, EEG-FM-Bench, AdaBrain-Bench — are each constrained in their own ways. For modalities like magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI), there is no systematic benchmark at all.

The result — claims about foundation models being “generalizable” or “foundational” often rest on cherry-picked tasks with no common reference point.

What is NeuralBench?

NeuralBench is built on three core Python packages that form a modular pipeline.

NeuralFetch handles dataset acquisition, pulling curated data from public repositories including OpenNeuro, DANDI, and NEMAR. NeuralSet prepares data as PyTorch-ready dataloaders, wrapping existing neuroscience tools like MNE-Python and nilearn for preprocessing, and HuggingFace for extracting stimulus embeddings (for tasks involving images, speech, or text). NeuralTrain provides modular training code built on PyTorch-Lightning, Pydantic, and the exca execution and caching library.

Once installed via pip install neuralbench, the framework is controlled via a command-line interface (CLI). Running a task is as simple as three commands: download the data, prepare the cache, and execute. Every task is configured through a lightweight YAML file that specifies the data source, train/validation/test splits, preprocessing steps, target processing, training hyperparameters, and evaluation metrics.

https://ai.meta.com/research/publications/neuralbench-a-unifying-framework-to-benchmark-neuroai-models/

What NeuralBench-EEG v1.0 Covers

The first release focuses on EEG and spans eight task categories: cognitive decoding (image, sentence, speech, typing, video, and word decoding), brain-computer interfacing (BCI), evoked responses, clinical tasks, internal state, sleep, phenotyping, and miscellaneous.

Three classes of models are compared:

  • Task-specific architectures (~1.5K–4.2M parameters, trained from scratch): ShallowFBCSPNet, Deep4Net, EEGNet, BDTCN, ATCNet, EEGConformer, SimpleConvTimeAgg, and CTNet.
  • EEG foundation models (~3.2M–157.1M parameters, pretrained and fine-tuned): BENDR, LaBraM, BIOT, CBraMod, LUNA, and REVE.
  • Handcrafted feature baselines: sklearn-style pipelines using symmetric positive definite (SPD) matrix representations fed into logistic or Ridge regression.

All foundation models are fine-tuned end-to-end using a shared training recipe — AdamW optimizer, learning rate of 10⁻⁴, weight decay of 0.05, cosine-annealing with 10% warmup, up to 50 epochs with early stopping (patience=10). The sole exception is BENDR, for which the learning rate is lowered to 10⁻⁵ and gradient clipping is applied at 0.5 to obtain stable learning curves. This intentional standardization otherwise removes model-specific optimization tricks — such as layer-wise learning rate decay, two-stage probing, or LoRA — so that architecture and pretraining methodology are what actually gets evaluated.

Data splitting is handled differently per task type to reflect real-world generalization constraints: predefined splits where provided by dataset research team, leave-concept-out for cognitive decoding tasks (all subjects seen in training, but a held-out set of stimuli used for testing), cross-subject splits for most clinical and BCI tasks, and within-subject splits for datasets with very few participants. Each model is trained three times per task using three different random seeds.

Evaluation metrics are standardized by task type: balanced accuracy for binary and multiclass classification, macro F1-score for multilabel classification, Pearson correlation for regression, and top-5 accuracy for retrieval tasks. All results are additionally reported as normalized scores (s̃), where 0 corresponds to dummy-level performance and 1 corresponds to perfect performance, enabling fair cross-task comparisons regardless of metric scale.

One important methodological note: some EEG foundation models were pretrained on datasets that overlap with NeuralBench’s downstream evaluation sets. Rather than discarding these results, the benchmark flags them with hashed bars in result figures so readers can identify potential pretraining data leakage — no strong trend suggesting leakage inflates performance was observed, but the transparency is preserved.

The benchmark offers two variants: NeuralBench-EEG-Core v1.0, which uses a single representative dataset per task for broad coverage, and NeuralBench-EEG-Full v1.0, which expands to up to 24 datasets per task to study within-task variability across recording hardware, labs, and subject populations. A Kendall’s τ of 0.926 (p < 0.001) between Core and Full rankings confirms that the Core variant is a reliable proxy — though a few model positions do shift, including CTNet overtaking LUNA when more datasets are included.

https://ai.meta.com/research/publications/neuralbench-a-unifying-framework-to-benchmark-neuroai-models/

Two Key Findings

Finding 1: Foundation models only marginally outperform task-specific models. The top-ranked models overall are REVE (69.2M parameters, mean normalized rank 0.20), LaBraM (5.8M, rank 0.21), and LUNA (40.4M, rank 0.30). But several task-specific models trained from scratch — CTNet (150K parameters, rank 0.32), SimpleConvTimeAgg (4.2M, rank 0.35), and Deep4Net (146K, rank 0.43) — trail closely behind. CTNet actually overtakes the LUNA foundation model to rank third in the Full variant, despite having roughly 270× fewer parameters. This shows the gap between task-specific and foundation models is narrow enough that expanding dataset coverage alone is sufficient to change global rankings.

Finding 2: Many tasks remain genuinely hard. Cognitive decoding tasks — recovering dense representations of images, speech, sentences, video, or words from brain activity — are particularly challenging, with even the best models scoring well below ceiling. Tasks like mental imagery, sleep arousal, psychopathology decoding, and cross-subject motor imagery and P300 classification frequently yield performance close to dummy level. These tasks represent the best benchmarks for stress-testing the next generation of EEG foundation models.

Tasks approaching saturation include SSVEP classification, pathology detection, seizure detection, sleep stage classification, and phenotyping tasks like age regression and sex classification.

Beyond EEG: MEG and fMRI

Even in this initial EEG-focused release, NeuralBench already supports MEG and fMRI tasks as proof of concept. Notably, the REVE model — pretrained exclusively on EEG data — achieves the best performance among all tested models on the typing decoding task in MEG. This is a striking early signal that EEG-pretrained representations may transfer meaningfully across brain recording modalities, a hypothesis the framework is positioned to rigorously test in future releases.

The infrastructure is explicitly designed for expansion to intracranial EEG (iEEG), functional near-infrared spectroscopy (fNIRS), and electromyography (EMG).

How to Get Started

Installation takes a single command: pip install neuralbench. From there, running the audiovisual stimulus classification task on EEG looks like this:

neuralbench eeg audiovisual_stimulus --download   # Download data
neuralbench eeg audiovisual_stimulus --prepare    # Prepare cache
neuralbench eeg audiovisual_stimulus              # Run the task

To run all 36 tasks against all 14 EEG models, the -m all_classic all_fm flag handles the orchestration. Full benchmark storage requirements are substantial: approximately 11 TB total (~3.2 TB raw data, ~7.8 TB preprocessed cache, ~333 GB logged results), with one GPU of at least 32 GB VRAM per job — though average peak GPU usage measured across experiments is only ~1.3 GB (maximum ~30.3 GB).

The full NeuralBench-EEG-Full v1.0 run requires approximately 1,751 GPU-hours across 4,947 experiments.

Key Takeaways

  • Meta AI’s NeuralBench-EEG v1.0 is an open EEG benchmark — 36 tasks, 94 datasets, 9,478 subjects, and 14 deep learning architectures under one standardized interface.
  • Despite up to 270× more parameters, EEG foundation models like REVE only marginally outperform lightweight task-specific models like CTNet (150K params) across the benchmark.
  • Cognitive decoding tasks (speech, video, sentence, word decoding from brain activity) and clinical predictions remain highly challenging, with most models scoring near dummy level.
  • REVE, pretrained only on EEG data, outperformed all models on MEG typing decoding — an early signal of meaningful cross-modality transfer.
  • NeuralBench is MIT-licensed.

Check out the Paper and GitHub Repo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us


Credit: Source link

ShareTweetSendSharePin

Related Posts

RGG’s Stranger Than Heaven Game Arrives This Winter
AI & Technology

RGG’s Stranger Than Heaven Game Arrives This Winter

May 7, 2026
Samsung Says Its Galaxy Watch Can Predict Fainting With ‘High Accuracy’
AI & Technology

Samsung Says Its Galaxy Watch Can Predict Fainting With ‘High Accuracy’

May 7, 2026
Anthropic Skill scanners passed every check. The malicious code rode in on a test file.
AI & Technology

Anthropic Skill scanners passed every check. The malicious code rode in on a test file.

May 7, 2026
Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class
AI & Technology

Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

May 7, 2026
Next Post
RGG’s Stranger Than Heaven Game Arrives This Winter

RGG's Stranger Than Heaven Game Arrives This Winter

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Jack Dorsey puts updated Vine back on the App Store

Jack Dorsey puts updated Vine back on the App Store

May 1, 2026
One Liberty tenant Cleary Gottlieb renews lease

One Liberty tenant Cleary Gottlieb renews lease

May 3, 2026
Historic mission over far side of the moon

Historic mission over far side of the moon

May 1, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!