• kpk ETH Primekpk ETH Prime(KPK ETH PRIME)$2,034.900.01%
  • bitcoinBitcoin(BTC)$70,062.000.62%
  • ethereumEthereum(ETH)$2,046.201.64%
  • kpk ETH Yieldkpk ETH Yield(KPK ETH YIELD)$2,030.62-0.04%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$645.520.85%
  • rippleXRP(XRP)$1.390.16%
  • usd-coinUSDC(USDC)$1.00-0.01%
  • solanaSolana(SOL)$86.060.33%
  • tronTRON(TRX)$0.2895521.95%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.04-0.54%
  • dogecoinDogecoin(DOGE)$0.092996-3.30%
  • whitebitWhiteBIT Coin(WBT)$55.600.37%
  • USDSUSDS(USDS)$1.000.01%
  • cardanoCardano(ADA)$0.2640520.66%
  • bitcoin-cashBitcoin Cash(BCH)$453.931.71%
  • HyperliquidHyperliquid(HYPE)$36.065.50%
  • leo-tokenLEO Token(LEO)$9.17-0.11%
  • moneroMonero(XMR)$349.391.63%
  • chainlinkChainlink(LINK)$9.071.37%
  • Ethena USDeEthena USDe(USDE)$1.000.02%
  • CantonCanton(CC)$0.147711-0.01%
  • stellarStellar(XLM)$0.1603120.22%
  • USD1USD1(USD1)$1.00-0.06%
  • RainRain(RAIN)$0.0091044.43%
  • daiDai(DAI)$1.00-0.02%
  • litecoinLitecoin(LTC)$54.721.33%
  • avalanche-2Avalanche(AVAX)$9.703.82%
  • hedera-hashgraphHedera(HBAR)$0.094991-0.78%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.03%
  • suiSui(SUI)$0.971.03%
  • zcashZcash(ZEC)$214.21-3.24%
  • shiba-inuShiba Inu(SHIB)$0.0000060.54%
  • the-open-networkToncoin(TON)$1.31-1.50%
  • crypto-com-chainCronos(CRO)$0.075625-0.31%
  • tether-goldTether Gold(XAUT)$5,146.35-0.29%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.1019390.39%
  • pax-goldPAX Gold(PAXG)$5,184.72-0.35%
  • polkadotPolkadot(DOT)$1.542.14%
  • MemeCoreMemeCore(M)$1.43-0.18%
  • uniswapUniswap(UNI)$3.931.59%
  • mantleMantle(MNT)$0.701.28%
  • Pi NetworkPi Network(PI)$0.2337346.91%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • okbOKB(OKB)$94.88-1.27%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • BittensorBittensor(TAO)$199.280.74%
  • SkySky(SKY)$0.075783-1.52%
  • Falcon USDFalcon USD(USDF)$1.00-0.05%
  • AsterAster(ASTER)$0.70-0.11%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Meta AI Researchers Release MapAnything: An End-to-End Transformer Architecture that Directly Regresses Factored, Metric 3D Scene Geometry

September 17, 2025
in AI & Technology
Reading Time: 9 mins read
A A
Meta AI Researchers Release MapAnything: An End-to-End Transformer Architecture that Directly Regresses Factored, Metric 3D Scene Geometry
ShareShareShareShareShare

A team of researchers from Meta Reality Labs and Carnegie Mellon University has introduced MapAnything, an end-to-end transformer architecture that directly regresses factored metric 3D scene geometry from images and optional sensor inputs. Released under Apache 2.0 with full training and benchmarking code, MapAnything advances beyond specialist pipelines by supporting over 12 distinct 3D vision tasks in a single feed-forward pass.

https://map-anything.github.io/assets/MapAnything.pdf

Why a Universal Model for 3D Reconstruction?

Image-based 3D reconstruction has historically relied on fragmented pipelines: feature detection, two-view pose estimation, bundle adjustment, multi-view stereo, or monocular depth inference. While effective, these modular solutions require task-specific tuning, optimization, and heavy post-processing.

YOU MAY ALSO LIKE

Looking Glass’ Musubi showcases its holographic display in a consumer-friendly package

Meta rolls out new features for scam protection

Recent transformer-based feed-forward models such as DUSt3R, MASt3R, and VGGT simplified parts of this pipeline but remained limited: fixed numbers of views, rigid camera assumptions, or reliance on coupled representations that needed expensive optimization.

MapAnything overcomes these constraints by:

  • Accepting up to 2,000 input images in a single inference run.
  • Flexibly using auxiliary data such as camera intrinsics, poses, and depth maps.
  • Producing direct metric 3D reconstructions without bundle adjustment.

The model’s factored scene representation—composed of ray maps, depth, poses, and a global scale factor—provides modularity and generality unmatched by prior approaches.

Architecture and Representation

At its core, MapAnything employs a multi-view alternating-attention transformer. Each input image is encoded with DINOv2 ViT-L features, while optional inputs (rays, depth, poses) are encoded into the same latent space via shallow CNNs or MLPs. A learnable scale token enables metric normalization across views.

The network outputs a factored representation:

  • Per-view ray directions (camera calibration).
  • Depth along rays, predicted up-to-scale.
  • Camera poses relative to a reference view.
  • A single metric scale factor converting local reconstructions into a globally consistent frame.

This explicit factorization avoids redundancy, allowing the same model to handle monocular depth estimation, multi-view stereo, structure-from-motion (SfM), or depth completion without specialized heads.

https://map-anything.github.io/assets/MapAnything.pdf

Training Strategy

MapAnything was trained across 13 diverse datasets spanning indoor, outdoor, and synthetic domains, including BlendedMVS, Mapillary Planet-Scale Depth, ScanNet++, and TartanAirV2. Two variants are released:

  • Apache 2.0 licensed model trained on six datasets.
  • CC BY-NC model trained on all thirteen datasets for stronger performance.

Key training strategies include:

  • Probabilistic input dropout: During training, geometric inputs (rays, depth, pose) are provided with varying probabilities, enabling robustness across heterogeneous configurations.
  • Covisibility-based sampling: Ensures input views have meaningful overlap, supporting reconstruction up to 100+ views.
  • Factored losses in log-space: Depth, scale, and pose are optimized using scale-invariant and robust regression losses to improve stability.

Training was performed on 64 H200 GPUs with mixed precision, gradient checkpointing, and curriculum scheduling, scaling from 4 to 24 input views.

Benchmarking Results

Multi-View Dense Reconstruction

On ETH3D, ScanNet++ v2, and TartanAirV2-WB, MapAnything achieves state-of-the-art (SoTA) performance across pointmaps, depth, pose, and ray estimation. It surpasses baselines like VGGT and Pow3R even when limited to images only, and improves further with calibration or pose priors.

For example:

  • Pointmap relative error (rel) improves to 0.16 with only images, compared to 0.20 for VGGT.
  • With images + intrinsics + poses + depth, the error drops to 0.01, while achieving >90% inlier ratios.

Two-View Reconstruction

Against DUSt3R, MASt3R, and Pow3R, MapAnything consistently outperforms across scale, depth, and pose accuracy. Notably, with additional priors, it achieves >92% inlier ratios on two-view tasks, significantly beyond prior feed-forward models.

Single-View Calibration

Despite not being trained specifically for single-image calibration, MapAnything achieves an average angular error of 1.18°, outperforming AnyCalib (2.01°) and MoGe-2 (1.95°).

Depth Estimation

On the Robust-MVD benchmark:

  • MapAnything sets new SoTA for multi-view metric depth estimation.
  • With auxiliary inputs, its error rates rival or surpass specialized depth models such as MVSA and Metric3D v2.

Overall, benchmarks confirm 2× improvement over prior SoTA methods in many tasks, validating the benefits of unified training.

Key Contributions

The research team highlight four major contributions:

  1. Unified Feed-Forward Model capable of handling more than 12 problem settings, from monocular depth to SfM and stereo.
  2. Factored Scene Representation enabling explicit separation of rays, depth, pose, and metric scale.
  3. State-of-the-Art Performance across diverse benchmarks with fewer redundancies and higher scalability.
  4. Open-Source Release including data processing, training scripts, benchmarks, and pretrained weights under Apache 2.0.

Conclusion

MapAnything establishes a new benchmark in 3D vision by unifying multiple reconstruction tasks—SfM, stereo, depth estimation, and calibration—under a single transformer model with a factored scene representation. It not only outperforms specialist methods across benchmarks but also adapts seamlessly to heterogeneous inputs, including intrinsics, poses, and depth. With open-source code, pretrained models, and support for over 12 tasks, MapAnything lays the groundwork for a truly general-purpose 3D reconstruction backbone.


Check out the Paper, Codes and Project Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

🔥[Recommended Read] NVIDIA AI Open-Sources ViPE (Video Pose Engine): A Powerful and Versatile 3D Video Annotation Tool for Spatial AI

Credit: Source link

ShareTweetSendSharePin

Related Posts

Looking Glass’ Musubi showcases its holographic display in a consumer-friendly package
AI & Technology

Looking Glass’ Musubi showcases its holographic display in a consumer-friendly package

March 11, 2026
Meta rolls out new features for scam protection
AI & Technology

Meta rolls out new features for scam protection

March 11, 2026
Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space
AI & Technology

Google AI Introduces Gemini Embedding 2: A Multimodal Embedding Model that Lets Your Bring Text, Images, Video, Audio, and Docs into the Embedding Space

March 11, 2026
Google starts rolling out Gemini in Chrome to users in Canada, India and New Zealand
AI & Technology

Google starts rolling out Gemini in Chrome to users in Canada, India and New Zealand

March 11, 2026
Next Post
Federal Reserve cuts interest rates a quarter-point as US job market wobbles

Federal Reserve cuts interest rates a quarter-point as US job market wobbles

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Arq, Inc. 2025 Q4 – Results – Earnings Call Presentation (NASDAQ:ARQ) 2026-03-10

Arq, Inc. 2025 Q4 – Results – Earnings Call Presentation (NASDAQ:ARQ) 2026-03-10

March 10, 2026
Lebanon emerges as new front in Iran war as Hezbollah trades strikes with Israel

Lebanon emerges as new front in Iran war as Hezbollah trades strikes with Israel

March 7, 2026
Trump tells NBC a ‘large amount’ of Iran’s leadership is gone

Trump tells NBC a ‘large amount’ of Iran’s leadership is gone

March 9, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!