• bitcoinBitcoin(BTC)$60,125.003.11%
  • ethereumEthereum(ETH)$1,621.153.34%
  • tetherTether(USDT)$1.000.04%
  • binancecoinBNB(BNB)$553.331.34%
  • usd-coinUSDC(USDC)$1.000.00%
  • rippleXRP(XRP)$1.062.15%
  • solanaSolana(SOL)$77.716.05%
  • tronTRON(TRX)$0.3178810.87%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.01-2.81%
  • HyperliquidHyperliquid(HYPE)$64.18-0.63%
  • dogecoinDogecoin(DOGE)$0.0732582.77%
  • RainRain(RAIN)$0.015597-0.82%
  • USDSUSDS(USDS)$1.000.02%
  • leo-tokenLEO Token(LEO)$9.25-0.03%
  • zcashZcash(ZEC)$413.503.63%
  • stellarStellar(XLM)$0.1999039.97%
  • whitebitWhiteBIT Coin(WBT)$54.9018.20%
  • moneroMonero(XMR)$306.770.12%
  • cardanoCardano(ADA)$0.1544846.81%
  • CantonCanton(CC)$0.1421450.29%
  • chainlinkChainlink(LINK)$7.393.04%
  • daiDai(DAI)$1.000.00%
  • USD1USD1(USD1)$1.000.04%
  • Ethena USDeEthena USDe(USDE)$1.000.05%
  • bitcoin-cashBitcoin Cash(BCH)$212.627.04%
  • the-open-networkGram (prev. Toncoin)(GRAM)$1.55-0.33%
  • litecoinLitecoin(LTC)$42.501.78%
  • hedera-hashgraphHedera(HBAR)$0.0719422.72%
  • Circle USYCCircle USYC(USYC)$1.130.02%
  • Global DollarGlobal Dollar(USDG)$1.000.01%
  • avalanche-2Avalanche(AVAX)$6.733.25%
  • suiSui(SUI)$0.723.39%
  • LABLAB(LAB)$9.01-34.02%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • shiba-inuShiba Inu(SHIB)$0.0000042.23%
  • crypto-com-chainCronos(CRO)$0.0547132.43%
  • tether-goldTether Gold(XAUT)$4,058.750.95%
  • nearNEAR Protocol(NEAR)$1.821.32%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.14-0.23%
  • BittensorBittensor(TAO)$205.201.13%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0587872.06%
  • pax-goldPAX Gold(PAXG)$4,062.360.98%
  • uniswapUniswap(UNI)$2.801.36%
  • okbOKB(OKB)$80.692.72%
  • AsterAster(ASTER)$0.630.71%
  • OndoOndo(ONDO)$0.3191743.50%
  • HTX DAOHTX DAO(HTX)$0.0000020.63%
  • Ripple USDRipple USD(RLUSD)$1.000.02%
  • Falcon USDFalcon USD(USDF)$1.00-0.05%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone

July 1, 2026
in AI & Technology
Reading Time: 15 mins read
A A
NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone
ShareShareShareShareShare

NVIDIA has released Nemotron-Labs-TwoTower, a diffusion language model built on a pretrained autoregressive backbone. It ships as open weights under the NVIDIA Nemotron Open Model License. The release targets a throughput bottleneck in text generation.

Autoregressive (AR) models decode one token at a time. That serial process caps generation throughput. Discrete diffusion language models take another route. They generate tokens in parallel and refine them iteratively.

Most diffusion language models use one network for two jobs. It represents clean tokens and denoises corrupted ones at every step. TwoTower separates these jobs into two towers. It keeps 98.7% of the AR baseline’s aggregate benchmark quality. It also reports 2.42× higher wall-clock generation throughput.

TL;DR

  • TwoTower splits diffusion into a frozen AR context tower and a trained denoiser tower.
  • It retains 98.7% of AR quality at 2.42× throughput (γ=0.8, S=16, 2×H100).
  • The denoiser trained on ~2.1T tokens; the backbone used 25T.
  • One checkpoint runs diffusion, mock-AR, and AR decoding modes.

Nemotron-Labs-TwoTower

TwoTower is a block-wise autoregressive diffusion model. It is instantiated on Nemotron-3-Nano-30B-A3B, an open-weight hybrid backbone. That backbone interleaves Mamba-2, self-attention, and mixture-of-experts (MoE) layers.

Each tower has 52 layers: 23 Mamba-2, 6 self-attention, and 23 MoE. The released checkpoint ships both towers, roughly 60B total parameters. Active parameters per token are about 3B per tower. The MoE uses 128 routable experts, of which 6 activate, plus 2 shared experts.

Both towers start as copies of the same backbone checkpoint. Only the denoiser tower is trained. The AR context tower stays frozen. The denoiser was trained on ~2.1T tokens, a fraction of the backbone’s 25T-token pretraining.

How the Two Towers Work

The AR context tower runs causally over the prompt and committed tokens. It produces per-layer KV cache and final Mamba-2 states. It preserves the backbone’s autoregressive capability.

The diffusion denoiser tower refines noisy blocks. Within a block, it uses bidirectional in-block attention. It stays causal with respect to past clean blocks.

The towers connect layer-by-layer. Denoiser layer i cross-attends to context tower layer i. This layer-aligned cross-attention gives multi-scale access to the backbone’s representations. Prior approaches broadcast only the last hidden state.

Two more denoiser modifications matter. Mamba-2 layers seed their initial state from the context tower’s Mamba state. The diffusion timestep modulates each layer through adaLN-single time conditioning. That adaLN module adds only ~1.5M parameters.

Generation runs block by block. Each block starts as S [MASK] tokens. The denoiser refines it over T steps, then commits it. The context tower then processes committed tokens to update its caches.

This explains why multiple denoising steps can still beat one-token decoding. Autoregressive decoding commits exactly one token per step. TwoTower commits multiple tokens per step early in refinement.

Benchmarks

Evaluations use BF16 on 2×H100 GPUs. The default operating point is confidence unmasking, threshold γ=0.8, block size S=16. The table compares the AR baseline against TwoTower diffusion decoding.

Task Nemotron-3-Nano-30B-A3B (AR) Nemotron-Labs-TwoTower (diffusion)
MMLU (5-shot, acc) 78.56 78.24
MMLU-Pro (5-shot, CoT EM) 62.59 60.93
ARC-Challenge (25-shot, acc_norm) 91.72 92.66
WinoGrande (5-shot, acc) 76.09 76.09
RACE (0-shot, acc) 88.90 88.90
HumanEval (0-shot) 79.27 75.58
MBPP-Sanitized (3-shot) 74.71 74.28
GSM8K (8-shot, acc) 92.49 90.14
MATH-500 (4-shot) 84.40 80.60
MMLU Global Lite (5-shot) 73.97 73.94
MGSM (8-shot, avg acc) 80.80 80.40
Quality retained 100% 98.7%
Generation throughput (× AR) 1.0× 2.42×

General knowledge stays within about one point of the AR baseline. Code and math show modest degradation. Commonsense and multilingual scores are recovered or slightly improved. Lowering γ commits more tokens per step and raises throughput, with reduced quality.

Running It: Three Generation Modes

The checkpoint exposes three inference paths. Full two-tower diffusion uses 2 GPUs, about 59GB per GPU in BF16. AR-only mode runs on a single 80GB GPU.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.bfloat16, trust_remote_code=True,
)
# context tower -> GPU 0, denoiser tower -> GPU 1
model.place_towers_on_devices("cuda:0", "cuda:1")
model.eval()

prompt = "France is a country "
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")

outputs = model.generate_mask_diffusion(
    inputs["input_ids"], max_new_tokens=128,
    block_size=16, steps_per_block=16, mask_token_id=3,
    temperature=0.1, confidence_threshold=0.8,
    eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

The three modes are generate_mask_diffusion(), generate_mock_ar(), and generate_ar(). Mask diffusion commits up to block_size tokens per step. Mock-AR and AR commit one token per step.

Where It Fits: Use Cases

The most direct use case is faster batch generation. A data team producing synthetic text can trade a small quality drop for throughput. At γ=0.8, that trade is 1.3% quality for 2.42× speed.

A second use case is tuning the quality–throughput trade-off. Raising γ preserves more quality, as per the NVIDIA’s paper. Lowering γ commits more tokens per step for speed.

A third use case is drop-in adaptation. The context tower keeps its LM head for speculative decoding, verification, or AR scoring. Teams can run AR and diffusion from one checkpoint.

Strengths and Weaknesses

Strengths:

  • Open weights under the NVIDIA Nemotron Open Model License; ready for commercial use
  • 98.7% of AR quality retained at 2.42× throughput at the default operating point
  • One checkpoint supports diffusion, mock-AR, and AR decoding
  • Denoiser trained on ~2.1T tokens, not a full re-pretrain
  • Sequence-length cache memory scales like the AR baseline

Weaknesses:

  • Full two-tower diffusion needs 2 GPUs and ~59GB per GPU in BF16
  • Code and math degrade more than general knowledge (HumanEval 79.27 → 75.58)
  • Keeping both towers resident raises the fixed model-weight memory footprint
  • Released checkpoint is a base model, before instruction tuning or alignment
  • Throughput past 3× comes with larger quality loss

Interactive Explainer



Check out the Paper and Weights. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

YOU MAY ALSO LIKE

Visa, Mastercard And Coinbase Have Launched A New Global Stablecoin

Google AI Introduces TabFM: A Hybrid-Attention Tabular Foundation Model for Zero-Shot Classification and Regression


Credit: Source link

ShareTweetSendSharePin

Related Posts

Visa, Mastercard And Coinbase Have Launched A New Global Stablecoin
AI & Technology

Visa, Mastercard And Coinbase Have Launched A New Global Stablecoin

July 1, 2026
Google AI Introduces TabFM: A Hybrid-Attention Tabular Foundation Model for Zero-Shot Classification and Regression
AI & Technology

Google AI Introduces TabFM: A Hybrid-Attention Tabular Foundation Model for Zero-Shot Classification and Regression

July 1, 2026
Gemini Spark Comes To Google’s Gemini App For macOS
AI & Technology

Gemini Spark Comes To Google’s Gemini App For macOS

July 1, 2026
Samsung Teases Wide-As-Hell New Foldable
AI & Technology

Samsung Teases Wide-As-Hell New Foldable

June 30, 2026
Next Post
6-year-old girl bitten by rabid bat while playing outside her Wisconsin home, family says – NBC News

6-year-old girl bitten by rabid bat while playing outside her Wisconsin home, family says - NBC News

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Astronauts are back on board the International Space Station after a shelter-in-place order

Astronauts are back on board the International Space Station after a shelter-in-place order

July 1, 2026
Search continues for girl who was swept away in high surf in Southern California

Search continues for girl who was swept away in high surf in Southern California

June 27, 2026
Defense rests in Karmelo Anthony trial

Defense rests in Karmelo Anthony trial

June 29, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!