• bitcoinBitcoin(BTC)$61,214.00-3.65%
  • ethereumEthereum(ETH)$1,598.51-9.77%
  • tetherTether(USDT)$1.000.06%
  • binancecoinBNB(BNB)$575.01-4.98%
  • usd-coinUSDC(USDC)$1.000.01%
  • rippleXRP(XRP)$1.11-5.44%
  • solanaSolana(SOL)$64.37-6.41%
  • tronTRON(TRX)$0.322378-2.74%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.033.25%
  • HyperliquidHyperliquid(HYPE)$59.22-11.97%
  • dogecoinDogecoin(DOGE)$0.082017-7.52%
  • USDSUSDS(USDS)$1.000.03%
  • leo-tokenLEO Token(LEO)$9.64-2.55%
  • RainRain(RAIN)$0.013187-6.60%
  • stellarStellar(XLM)$0.199159-3.73%
  • zcashZcash(ZEC)$364.39-30.15%
  • cardanoCardano(ADA)$0.162009-12.53%
  • moneroMonero(XMR)$313.57-13.74%
  • CantonCanton(CC)$0.145295-3.19%
  • chainlinkChainlink(LINK)$7.41-7.52%
  • whitebitWhiteBIT Coin(WBT)$43.80-4.35%
  • USD1USD1(USD1)$1.000.08%
  • Ethena USDeEthena USDe(USDE)$1.000.07%
  • bitcoin-cashBitcoin Cash(BCH)$218.93-11.49%
  • daiDai(DAI)$1.000.06%
  • the-open-networkToncoin(TON)$1.52-10.97%
  • MemeCoreMemeCore(M)$2.91-12.70%
  • hedera-hashgraphHedera(HBAR)$0.080387-5.16%
  • litecoinLitecoin(LTC)$43.35-5.85%
  • LABLAB(LAB)$9.86-23.63%
  • avalanche-2Avalanche(AVAX)$6.86-11.24%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • suiSui(SUI)$0.71-9.61%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • shiba-inuShiba Inu(SHIB)$0.000005-7.63%
  • tether-goldTether Gold(XAUT)$4,302.50-3.29%
  • crypto-com-chainCronos(CRO)$0.058321-4.35%
  • nearNEAR Protocol(NEAR)$2.01-13.51%
  • Global DollarGlobal Dollar(USDG)$1.000.00%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.130.98%
  • pax-goldPAX Gold(PAXG)$4,320.59-3.23%
  • BittensorBittensor(TAO)$196.98-7.86%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.056237-7.85%
  • worldcoin-wldWorldcoin(WLD)$0.51-3.28%
  • mantleMantle(MNT)$0.52-6.44%
  • Ripple USDRipple USD(RLUSD)$1.000.00%
  • OndoOndo(ONDO)$0.337070-11.06%
  • AsterAster(ASTER)$0.62-6.41%
  • polkadotPolkadot(DOT)$0.95-10.25%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory

June 5, 2026
in AI & Technology
Reading Time: 7 mins read
A A
Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory
ShareShareShareShareShare

Google DeepMind released Quantization-Aware Training (QAT) checkpoints for the Gemma 4 family. The release targets local deployment on edge devices and consumer GPUs. It follows the Gemma 4 launch in April and a 12B model two days earlier.

We compared the available Gemma 4 edge-model formats using only published numbers. The goal was simple. Show what each precision level costs in memory. Then show what QAT actually changes.

YOU MAY ALSO LIKE

AI agents are learning on the job — just not for your whole team

Google Shuts Down The AI Image App Pixel Studio

What QAT actually does

Quantization shrinks a model by lowering weight precision. Standard Post-Training Quantization (PTQ) compresses a finished model. That often degrades quality. QAT instead simulates quantization during training. The model learns to compensate for the precision loss.

Google’s AI team states its QAT results yield higher overall quality than standard PTQ baselines. Google did not publish Gemma 4 QAT benchmark scores in the announcement. For context, Gemma 3 QAT cut the Q4_0 perplexity drop by 54% using llama.cpp evaluation. We cite that only as prior-generation precedent.

The comparison task

Compare Gemma 4 E2B and E4B across three formats. The formats are BF16, Q4_0 QAT, and the new mobile QAT schema. Rank them on memory footprint, quality preservation, and on-device accessibility. Use published figures only.

Memory results

Format E2B E4B Basis
BF16 (16-bit) 9.6 GB 15 GB Official Gemma 4 docs
Q4_0 (4-bit, QAT) 3.2 GB 5 GB Official Gemma 4 docs
Mobile (QAT, E2B) ~1 GB — QAT announcement

The Q4_0 figures match the footprint of PTQ Q4_0. QAT does not change the size at a given format. It improves quality at that size. The new mobile schema delivers the additional reduction.

Using that mobile schema, Google reduced Gemma 4 E2B to about 1GB. Developers can go lower still. The text-only model without Per-Layer Embeddings needs under 1GB, dropping the audio and vision encoders.

Per-format breakdown

BF16 is the quality baseline. E2B needs 9.6 GB and E4B needs 15 GB. It is the reference point, not a phone deployment target.

Q4_0 QAT is the general-purpose local format. E2B drops to 3.2 GB and E4B to 5 GB. QAT preserves more quality here than PTQ at the same size. This format fits consumer GPUs. Earlier E2B testing also ran on a Raspberry Pi 5 at INT4.

The mobile format is the edge-specialized schema. It brings E2B to about 1 GB. It uses static activations, channel-wise quantization, and targeted 2-bit compression.

How the mobile schema works

Google AI team engineered four techniques for mobile hardware. Static activations pre-calculate scaling during training, reducing on-device work. Channel-wise quantization fits the design of mobile accelerators. Targeted 2-bit quantization compresses only the token-generation layers. Embedding and KV cache optimization shrinks the active memory footprint.

Core reasoning layers stay at higher precision. That protects capability while cutting storage. Developers can also deploy text-only and drop the audio and vision encoders. That trims memory further for use cases that need no multimodality.

Dimension breakdown

Scores are a qualitative ranking of the formats for on-device use. Memory is the only hard-measured axis. Quality reflects Google’s disclosed design, not measured Gemma 4 numbers. Each score has a one-line basis.

Dimension BF16 Q4_0 QAT Mobile QAT
Memory footprint 1 — heaviest, 9.6 GB E2B 4 — 3.2 GB E2B 5 — ~1 GB E2B text-only
Quality preservation 5 — full-precision baseline 4 — QAT-preserved, near baseline 3 — 2-bit token layers, core kept higher
Decode speed 2 — no quantization speedup 4 — 4-bit accelerates decode 5 — mobile-optimized static activations
Deployment breadth 4 — loadable but heavy 5 — llama.cpp, Ollama, LM Studio, vLLM, MLX 3 — LiteRT-LM, Transformers.js, edge-focused
On-device accessibility 1 — needs large GPU 4 — consumer GPU, Raspberry Pi 5 5 — runs on phones
Total (/25) 13 21 21

Winner

The result is a tie by design. Q4_0 QAT and mobile QAT both score 21, but for different hardware. For phones, the mobile format leads. It reaches about 1GB on E2B and targets mobile accelerators directly. For laptops and consumer GPUs, Q4_0 QAT is the practical default. BF16 stays the quality reference, not a local choice.

Methodology and limits

Memory figures come from Google’s Gemma 4 documentation. The ~1GB E2B figure comes from the QAT announcement. Quality is Google’s stated claim. No independent Gemma 4 QAT quality numbers were published at release. We did not run the models locally for this comparison. Developers should test at their own quantization and workload before building.

Key Takeaways

  • Q4_0 QAT cuts Gemma 4 E2B to 3.2 GB and E4B to 5 GB, from 9.6 GB and 15 GB at BF16.
  • A new mobile QAT schema brings E2B to about 1 GB; text-only without PLE goes under 1 GB.
  • QAT changes quality at a given size, not the size itself; the mobile format drives the extra memory cut.
  • Google claims higher quality than PTQ but published no Gemma 4 QAT benchmark numbers at release.
  • Weights ship today on Hugging Face with llama.cpp, Ollama, LM Studio, vLLM, MLX, and LiteRT-LM support.

Marktechpost’s Visual Explainer

Marktechpost · Benchmark

Gemma 4 QAT: Comparing Q4_0 and the New Mobile Format

Google DeepMind released Quantization-Aware Training checkpoints for Gemma 4. We compared three edge-model formats on published numbers.

Formats compared

BF16 (16-bit)  ·  Q4_0 QAT (4-bit)  ·  Mobile QAT

June 5, 2026

The Comparison Task

What we ranked

$ compare gemma-4 --models E2B,E4B \
    --formats BF16,Q4_0-QAT,MOBILE-QAT \
    --rank memory,quality,accessibility \
    --source published-only --no-self-run

Memory from official Gemma 4 docs. Quality from Google’s stated claim. No models run locally.

Format 1 of 3 · Reference

BF16 (16-bit)

13 / 25

The full-precision quality baseline. E2B needs 9.6 GB and E4B needs 15 GB.

Top observation: a reference point, not a phone or laptop deployment target.

Format 2 of 3 · Laptop / GPU

Q4_0 QAT (4-bit)

21 / 25

The general-purpose local format. E2B drops to 3.2 GB and E4B to 5 GB.

Top observation: QAT preserves more quality than PTQ at the same 4-bit size.

Format 3 of 3 · Mobile

Mobile QAT

21 / 25

The edge-specialized schema. Brings E2B to about 1 GB.

Top observation: 2-bit on token layers, reasoning layers kept at higher precision.

Leaderboard

Full ranking

Dimension BF16 Q4_0 QAT Mobile QAT
Memory footprint 1 4 5
Quality preservation 5 4 3
Decode speed 2 4 5
Deployment breadth 4 5 3
On-device accessibility 1 4 5
Total 13 21 21

Tie by design: Q4_0 wins laptops and GPUs; mobile wins phones.

Key Takeaways

What developers should know

  • Q4_0 QAT cuts E2B to 3.2 GB and E4B to 5 GB, from 9.6 GB and 15 GB at BF16.
  • A new mobile QAT schema brings E2B to about 1 GB; text-only without PLE goes under 1 GB.
  • QAT changes quality at a given size; the mobile format drives the extra memory cut.
  • Google claims higher quality than PTQ but published no Gemma 4 QAT numbers.
  • Weights ship today on Hugging Face with llama.cpp, Ollama, vLLM, and MLX support.

Check out the Model weights (Q4_0 QAT collection, Mobile QAT collection) and Google blog (QAT release). Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us


Credit: Source link

ShareTweetSendSharePin

Related Posts

AI agents are learning on the job — just not for your whole team
AI & Technology

AI agents are learning on the job — just not for your whole team

June 5, 2026
Google Shuts Down The AI Image App Pixel Studio
AI & Technology

Google Shuts Down The AI Image App Pixel Studio

June 5, 2026
The University Of Cambridge Says It Successfully Tested A Vaccine With An AI-Designed Antigen
AI & Technology

The University Of Cambridge Says It Successfully Tested A Vaccine With An AI-Designed Antigen

June 5, 2026
OpenAI Will Let The US Government Review Its AI Models Before Release
AI & Technology

OpenAI Will Let The US Government Review Its AI Models Before Release

June 5, 2026
Next Post
Erewhon’s luxe Reserve membership perks revealed

Erewhon's luxe Reserve membership perks revealed

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Frozen food item sold at Costco recalled over salmonella risk

Frozen food item sold at Costco recalled over salmonella risk

May 31, 2026
What You Need to Avoid with Generational Homes

What You Need to Avoid with Generational Homes

June 2, 2026
Massive wildfires tear across Western U.S.

Massive wildfires tear across Western U.S.

June 4, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!