• bitcoinBitcoin(BTC)$62,048.000.61%
  • ethereumEthereum(ETH)$1,647.330.30%
  • tetherTether(USDT)$1.00-0.02%
  • binancecoinBNB(BNB)$592.370.63%
  • usd-coinUSDC(USDC)$1.000.03%
  • rippleXRP(XRP)$1.12-1.61%
  • solanaSolana(SOL)$65.010.05%
  • tronTRON(TRX)$0.3218520.09%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.49%
  • dogecoinDogecoin(DOGE)$0.084332-0.50%
  • HyperliquidHyperliquid(HYPE)$58.11-2.96%
  • USDSUSDS(USDS)$1.000.00%
  • leo-tokenLEO Token(LEO)$9.42-0.20%
  • RainRain(RAIN)$0.0132854.38%
  • zcashZcash(ZEC)$439.47-4.77%
  • CantonCanton(CC)$0.1683131.64%
  • stellarStellar(XLM)$0.188679-4.95%
  • moneroMonero(XMR)$327.301.88%
  • cardanoCardano(ADA)$0.163562-1.26%
  • whitebitWhiteBIT Coin(WBT)$51.0616.29%
  • chainlinkChainlink(LINK)$7.78-0.04%
  • the-open-networkToncoin(TON)$1.69-1.92%
  • Ethena USDeEthena USDe(USDE)$1.000.04%
  • USD1USD1(USD1)$1.00-0.04%
  • daiDai(DAI)$1.000.00%
  • bitcoin-cashBitcoin Cash(BCH)$200.45-1.14%
  • MemeCoreMemeCore(M)$2.87-2.51%
  • hedera-hashgraphHedera(HBAR)$0.078825-1.11%
  • litecoinLitecoin(LTC)$42.691.10%
  • suiSui(SUI)$0.751.62%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.02%
  • avalanche-2Avalanche(AVAX)$6.55-0.52%
  • shiba-inuShiba Inu(SHIB)$0.0000052.05%
  • nearNEAR Protocol(NEAR)$2.120.86%
  • crypto-com-chainCronos(CRO)$0.060103-0.48%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • tether-goldTether Gold(XAUT)$4,149.21-3.47%
  • LABLAB(LAB)$8.10-19.96%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.27%
  • BittensorBittensor(TAO)$210.18-0.31%
  • pax-goldPAX Gold(PAXG)$4,162.54-3.40%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0589117.59%
  • mantleMantle(MNT)$0.541.41%
  • OndoOndo(ONDO)$0.353491-1.62%
  • worldcoin-wldWorldcoin(WLD)$0.50-4.23%
  • AsterAster(ASTER)$0.631.41%
  • Ripple USDRipple USD(RLUSD)$1.000.00%
  • AudieraAudiera(BEAT)$5.7123.96%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

On-device AI agents hit a hard memory limit. Apple’s new architecture routes around it.

June 9, 2026
in AI & Technology
Reading Time: 5 mins read
A A
On-device AI agents hit a hard memory limit. Apple’s new architecture routes around it.
ShareShareShareShareShare

On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use. Enterprise architects evaluating agentic workloads have had to choose between capable cloud-dependent models and limited on-device ones. Apple’s third-generation foundation models, announced at WWDC26, break that constraint by moving the weight set off DRAM entirely.

The AFM 3 family was developed in collaboration with Google and spans five models: two on-device and three server-based, all running within Apple’s Private Cloud Compute boundary. The server-side models, including AFM 3 Cloud Pro for agentic tool use and complex reasoning, run on Nvidia GPUs in Google Cloud. The on-device architecture is Apple’s own. AFM 3 Core Advanced is a 20-billion-parameter model that stores weights in NAND flash rather than DRAM.

YOU MAY ALSO LIKE

Insta360’s Luna Ultra Takes On DJI’s Osmo Pocket Gimbal Cameras

Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared

“Instead of forcing the entire model into DRAM, the full model is stored in flash memory,” Apple’s research team wrote. “Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt.”

How the architecture actually works

The memory wall Apple is working around is one every local AI developer runs into.

“You can’t put 20B parameters in RAM at any reasonable precision,” Awni Hannun, a researcher at Anthropic and former Apple research scientist, posted on X. “To make it work they are using pretty exotic architecture by today’s standards. A small model predicts from the query (or prompt) which experts to load from NAND into RAM.”

That prediction-and-load mechanism has three distinct components, each driven by the hardware constraints of consumer silicon.

The full 20B weight set lives in flash, not DRAM. AFM 3 Core Advanced stores its entire parameter set in NAND flash rather than active memory. Standard on-device deployments require the full model to fit in DRAM, which is what caps their parameter counts. Apple’s approach, which it calls Instruction-Following Pruning (IFP) and developed with its own researchers, treats flash as the model’s permanent home and DRAM as a working buffer for whichever experts a given prompt requires.

Expert routing happens once per prompt, not per token. In a conventional Mixture of Experts model, a router selects different experts for every token generated — which would require continuous weight movement between flash and DRAM at inference speed. NAND-to-DRAM bandwidth cannot support that. AFM 3 Core Advanced routes once at prompt time, selects a fixed expert set, loads it into DRAM alongside always-active shared experts, and generates all tokens from that same configuration.

“The key distinction from a typical MoE is that you do this once per query and then generate all the tokens with the same experts,” Hannun wrote.

Source: Apple Machine Learning Research, June 8, 2026.

Active parameter count scales from 1B to 4B depending on task complexity. Rather than running a fixed model size for every request, AFM 3 Core Advanced adjusts how many parameters it activates based on what the task requires — 1 billion for simpler operations, up to 4 billion for harder ones, all drawn from the 20-billion-parameter pool in flash.

What Apple has and hasn’t disclosed

The architecture paper is detailed on the memory design and sparse activation mechanism. It is less forthcoming on practical deployment constraints.

Apple’s profiling tools expose timing but not the metrics that decide production viability. “Energy, memory bandwidth, thermal? Not in the docs,” Marco Abis, who is building Ziraph, a profiler for local AI on Apple silicon, posted on X. “A notable gap, given those decide most of on-device performance.” 

Abis also did not find a statement in Apple’s documentation — across the Core AI docs, the Foundation Models docs or the Private Cloud Compute security post — of when an on-device request transparently offloads, or whether that routing is visible to the developer or the user. For enterprises that need to document where inference runs, that is a direct compliance problem.

Not all the information is currently available. Apple has indicated a full technical report with benchmarks is coming later this summer.

What this means for enterprise architects

Regulated industries evaluating agentic AI deployments now have a concrete architectural decision to make.

  • The DRAM wall for on-device agents just moved. Enterprises evaluating agents that need to run without a cloud round-trip now have a 20-billion-parameter local option to evaluate. The constraint shifts from model capability to device hardware.

  • The private/cloud boundary is now an architectural decision, not a default. Simpler requests stay on-device; complex agentic tasks route to AFM 3 Cloud Pro on Private Cloud Compute. Apple has not publicly specified when a request offloads or whether that routing is visible to the developer — a gap that complicates policy decisions for organizations that need to document where inference runs.

  • The agentic server tier depends on Google Cloud. AFM 3 Cloud Pro runs on Nvidia GPUs in Google Cloud. The Private Cloud Compute guarantee covers data privacy. It does not eliminate the Google Cloud dependency for server-side inference.

AFM 3 Core Advanced gives enterprises a 20-billion-parameter on-device option that did not exist before WWDC26. Whether it is deployable at scale depends on answers Apple has not yet published. Those details are due in the summer technical report.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Insta360’s Luna Ultra Takes On DJI’s Osmo Pocket Gimbal Cameras
AI & Technology

Insta360’s Luna Ultra Takes On DJI’s Osmo Pocket Gimbal Cameras

June 10, 2026
Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared
AI & Technology

Top AI Coding Agents and Development Platforms in 2026: Atoms, Devin, Windsurf, Cursor, Warp, and More Compared

June 10, 2026
Anthropic Releases Claude Fable 5 and Claude Mythos 5: Same Underlying Model, Different Safeguards, New Mythos-Class Tier
AI & Technology

Anthropic Releases Claude Fable 5 and Claude Mythos 5: Same Underlying Model, Different Safeguards, New Mythos-Class Tier

June 10, 2026
Logitech Mobi Fold Review: The Ultra-Compact Travel Mouse
AI & Technology

Logitech Mobi Fold Review: The Ultra-Compact Travel Mouse

June 10, 2026
Next Post
Fresh blow for California’s Carl’s Jr as iconic burger chain to close more stores

Fresh blow for California’s Carl’s Jr as iconic burger chain to close more stores

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Defunding Planned Parenthood is ‘politically toxic,’ says its CEO

Defunding Planned Parenthood is ‘politically toxic,’ says its CEO

June 9, 2026
Cruise ship passenger speaks to TODAY from quarantine

Cruise ship passenger speaks to TODAY from quarantine

June 9, 2026
Rare pygmy hippo born at Berlin zoo

Rare pygmy hippo born at Berlin zoo

June 9, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!