• bitcoinBitcoin(BTC)$78,130.000.72%
  • ethereumEthereum(ETH)$2,333.120.63%
  • tetherTether(USDT)$1.000.00%
  • rippleXRP(XRP)$1.43-0.27%
  • binancecoinBNB(BNB)$632.35-0.82%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$86.660.26%
  • tronTRON(TRX)$0.3240290.18%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.02-0.62%
  • dogecoinDogecoin(DOGE)$0.0985430.06%
  • whitebitWhiteBIT Coin(WBT)$55.260.69%
  • USDSUSDS(USDS)$1.00-0.01%
  • HyperliquidHyperliquid(HYPE)$41.29-0.01%
  • leo-tokenLEO Token(LEO)$10.260.18%
  • cardanoCardano(ADA)$0.251555-0.06%
  • bitcoin-cashBitcoin Cash(BCH)$453.97-0.14%
  • moneroMonero(XMR)$383.462.51%
  • chainlinkChainlink(LINK)$9.430.20%
  • zcashZcash(ZEC)$355.91-0.19%
  • CantonCanton(CC)$0.150170-1.86%
  • stellarStellar(XLM)$0.170833-1.40%
  • MemeCoreMemeCore(M)$4.33-1.36%
  • daiDai(DAI)$1.000.02%
  • USD1USD1(USD1)$1.000.00%
  • litecoinLitecoin(LTC)$56.30-0.27%
  • avalanche-2Avalanche(AVAX)$9.44-0.13%
  • hedera-hashgraphHedera(HBAR)$0.0922900.99%
  • Ethena USDeEthena USDe(USDE)$1.00-0.01%
  • suiSui(SUI)$0.95-0.32%
  • shiba-inuShiba Inu(SHIB)$0.000006-0.39%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • RainRain(RAIN)$0.007131-5.24%
  • the-open-networkToncoin(TON)$1.32-2.88%
  • crypto-com-chainCronos(CRO)$0.0704030.83%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • tether-goldTether Gold(XAUT)$4,693.57-0.01%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.075134-1.34%
  • BittensorBittensor(TAO)$247.37-1.08%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • pax-goldPAX Gold(PAXG)$4,693.37-0.09%
  • mantleMantle(MNT)$0.660.93%
  • polkadotPolkadot(DOT)$1.26-0.27%
  • uniswapUniswap(UNI)$3.280.70%
  • SkySky(SKY)$0.0880545.37%
  • Pi NetworkPi Network(PI)$0.1774764.12%
  • nearNEAR Protocol(NEAR)$1.41-0.41%
  • Falcon USDFalcon USD(USDF)$1.00-0.01%
  • okbOKB(OKB)$84.33-0.22%
  • HTX DAOHTX DAO(HTX)$0.000002-0.32%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Meta Releases TRIBE v2: A Brain Encoding Model That Predicts fMRI Responses Across Video, Audio, and Text Stimuli

March 27, 2026
in AI & Technology
Reading Time: 8 mins read
A A
Meta Releases TRIBE v2: A Brain Encoding Model That Predicts fMRI Responses Across Video, Audio, and Text Stimuli
ShareShareShareShareShare

Neuroscience has long been a field of divide and conquer. Researchers typically map specific cognitive functions to isolated brain regions—like motion to area V5 or faces to the fusiform gyrus—using models tailored to narrow experimental paradigms. While this has provided deep insights, the resulting landscape is fragmented, lacking a unified framework to explain how the human brain integrates multisensory information.

Meta’s FAIR team has introduced TRIBE v2, a tri-modal foundation model designed to bridge this gap. By aligning the latent representations of state-of-the-art AI architectures with human brain activity, TRIBE v2 predicts high-resolution fMRI responses across diverse naturalistic and experimental conditions.

YOU MAY ALSO LIKE

RAG Without Vectors: How PageIndex Retrieves by Reasoning

BYD’s next all-electric hypercar is a convertible that’s coming to Europe first

https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/

The Architecture: Multi-modal Integration

TRIBE v2 does not learn to ‘see’ or ‘hear’ from scratch. Instead, it leverages the representational alignment between deep neural networks and the primate brain. The architecture consists of three frozen foundation models serving as feature extractors, a temporal transformer, and a subject-specific prediction block.

1. Feature Extraction

The model processes stimuli through three specialized encoders:

  • Text: Contextualized embeddings are extracted from LLaMA 3.2-3B. For every word, the model prepends the preceding 1,024 words to provide temporal context, which is then mapped to a 2 Hz grid.
  • Video: The model uses V-JEPA2-Giant to process 64-frame segments spanning the preceding 4 seconds for each time-bin.
  • Audio: Sound is processed through Wav2Vec-BERT 2.0, with representations resampled to 2 Hz to match the stimulus frequency (fstim) (f_{stim}).

2. Temporal Aggregation

The resulting embeddings are compressed into a shared dimension (D=384)(D=384) and concatenated to form a multi-modal time series with a model dimension of Dmodel=3×384=1152D_{model} = 3 \times 384 = 1152. This sequence is fed into a Transformer encoder (8 layers, 8 attention heads) that exchanges information across a 100-second window.

3. Subject-Specific Prediction

To predict brain activity, the Transformer outputs are decimated to the 1 Hz fMRI frequency (ffMRI)(f_{fMRI}) and passed through a Subject Block. This block projects the latent representations to 20,484 cortical vertices (fsaverage5surface)(fsaverage5 surface) and 8,802 subcortical voxels.

Data and Scaling Laws

A significant hurdle in brain encoding is data scarcity. TRIBE v2 addresses this by utilizing ‘deep’ datasets for training—where a few subjects are recorded for many hours—and ‘wide’ datasets for evaluation.

  • Training: The model was trained on 451.6 hours of fMRI data from 25 subjects across four naturalistic studies (movies, podcasts, and silent videos).
  • Evaluation: It was evaluated across a broader collection totaling 1,117.7 hours from 720 subjects.

The research team observed a log-linear increase in encoding accuracy as the training data volume increased, with no evidence of a plateau. This suggests that as neuroimaging repositories expand, the predictive power of models like TRIBE v2 will continue to scale.

Results: Beating the Baselines

TRIBE v2 significantly outperforms traditional Finite Impulse Response (FIR) models, the long-standing gold standard for voxel-wise encoding.

Zero-Shot and Group Performance

One of the model’s most striking capabilities is zero-shot generalization to new subjects. Using an ‘unseen subject’ layer, TRIBE v2 can predict the group-averaged response of a new cohort more accurately than the actual recording of many individual subjects within that cohort. In the high-resolution Human Connectome Project (HCP) 7T dataset, TRIBE v2 achieved a group correlation (Rgroup) (R_{group}) near 0.4, a two-fold improvement over the median subject’s group-predictivity.

Fine-Tuning

When given a small amount of data (at most one hour) for a new participant, fine-tuning TRIBE v2 for just one epoch leads to a two- to four-fold improvement over linear models trained from scratch.

In-Silico Experimentation

The research team argue that TRIBE v2 could be useful for piloting or pre-screening neuroimaging studies. By running virtual experiments on the Individual Brain Charting (IBC) dataset, the model recovered classic functional landmarks:

  • Vision: It accurately localized the fusiform face area (FFA) and parahippocampal place area (PPA).
  • Language: It successfully recovered the temporo-parietal junction (TPJ) for emotional processing and Broca’s area for syntax.

Furthermore, applying Independent Component Analysis (ICA) to the model’s final layer revealed that TRIBE v2 naturally learns five well-known functional networks: primary auditory, language, motion, default mode, and visual.

https://aidemos.atmeta.com/tribev2/

Key Takeaway

  • A Powerhouse Tri-modal Architecture: TRIBE v2 is a foundation model that integrates video, audio, and language by leveraging state-of-the-art encoders like LLaMA 3.2 for text, V-JEPA2 for video, and Wav2Vec-BERT for audio.
  • Log-Linear Scaling Laws: Much like the Large Language Models we use every day, TRIBE v2 follows a log-linear scaling law; its ability to accurately predict brain activity increases steadily as it is fed more fMRI data, with no performance plateau currently in sight.
  • Superior Zero-Shot Generalization: The model can predict the brain responses of unseen subjects in new experimental conditions without any additional training. Remarkably, its zero-shot predictions are often more accurate at estimating group-averaged brain responses than the recordings of individual human subjects themselves.
  • The Dawn of In-Silico Neuroscience: TRIBE v2 enables ‘in-silico’ experimentation, allowing researchers to run virtual neuroscientific tests on a computer. It successfully replicated decades of empirical research by identifying specialized areas like the fusiform face area (FFA) and Broca’s area purely through digital simulation.
  • Emergent Biological Interpretability: Even though it’s a deep learning ‘black box,’ the model’s internal representations naturally organized themselves into five well-known functional networks: primary auditory, language, motion, default mode, and visual.

Check out the Code, Weights and Demo. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Meta Releases TRIBE v2: A Brain Encoding Model That Predicts fMRI Responses Across Video, Audio, and Text Stimuli appeared first on MarkTechPost.

Credit: Source link

ShareTweetSendSharePin

Related Posts

RAG Without Vectors: How PageIndex Retrieves by Reasoning
AI & Technology

RAG Without Vectors: How PageIndex Retrieves by Reasoning

April 26, 2026
BYD’s next all-electric hypercar is a convertible that’s coming to Europe first
AI & Technology

BYD’s next all-electric hypercar is a convertible that’s coming to Europe first

April 25, 2026
xAI Launches grok-voice-think-fast-1.0: Topping τ-voice Bench at 67.3%, Outperforming Gemini, GPT Realtime, and More
AI & Technology

xAI Launches grok-voice-think-fast-1.0: Topping τ-voice Bench at 67.3%, Outperforming Gemini, GPT Realtime, and More

April 25, 2026
OpenAI’s Sam Altman apologizes for not reporting ChatGPT account of Tumbler Ridge suspect to police
AI & Technology

OpenAI’s Sam Altman apologizes for not reporting ChatGPT account of Tumbler Ridge suspect to police

April 25, 2026
Next Post
New guidelines for managing cholesterol

New guidelines for managing cholesterol

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Google to Release New AI Chips, Challenging Nvidia | Bloomberg Tech 4/20/2026

Google to Release New AI Chips, Challenging Nvidia | Bloomberg Tech 4/20/2026

April 23, 2026
Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost

Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost

April 23, 2026
This Morning’s Top Headlines – April 13 | Morning News NOW

This Morning’s Top Headlines – April 13 | Morning News NOW

April 26, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!