• bitcoinBitcoin(BTC)$67,342.00-5.28%
  • ethereumEthereum(ETH)$1,899.63-4.79%
  • tetherTether(USDT)$1.00-0.03%
  • binancecoinBNB(BNB)$659.35-4.50%
  • usd-coinUSDC(USDC)$1.000.02%
  • rippleXRP(XRP)$1.22-5.48%
  • solanaSolana(SOL)$75.45-6.44%
  • tronTRON(TRX)$0.334760-2.57%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.03-0.89%
  • HyperliquidHyperliquid(HYPE)$69.62-3.45%
  • dogecoinDogecoin(DOGE)$0.094138-5.78%
  • USDSUSDS(USDS)$1.00-0.02%
  • zcashZcash(ZEC)$603.209.48%
  • leo-tokenLEO Token(LEO)$10.040.24%
  • RainRain(RAIN)$0.0140563.47%
  • LABLAB(LAB)$26.4472.87%
  • cardanoCardano(ADA)$0.215986-5.92%
  • stellarStellar(XLM)$0.220955-10.22%
  • moneroMonero(XMR)$329.61-5.88%
  • chainlinkChainlink(LINK)$8.54-5.26%
  • whitebitWhiteBIT Coin(WBT)$49.51-5.41%
  • CantonCanton(CC)$0.150385-1.81%
  • bitcoin-cashBitcoin Cash(BCH)$282.70-2.94%
  • the-open-networkToncoin(TON)$1.99-6.97%
  • USD1USD1(USD1)$1.00-0.01%
  • Ethena USDeEthena USDe(USDE)$1.000.00%
  • daiDai(DAI)$1.000.01%
  • MemeCoreMemeCore(M)$3.321.81%
  • hedera-hashgraphHedera(HBAR)$0.087625-5.73%
  • litecoinLitecoin(LTC)$48.32-4.83%
  • avalanche-2Avalanche(AVAX)$8.32-6.41%
  • nearNEAR Protocol(NEAR)$2.733.08%
  • suiSui(SUI)$0.83-5.18%
  • shiba-inuShiba Inu(SHIB)$0.000005-4.66%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • Circle USYCCircle USYC(USYC)$1.13-0.07%
  • crypto-com-chainCronos(CRO)$0.062977-4.00%
  • tether-goldTether Gold(XAUT)$4,460.47-0.10%
  • Global DollarGlobal Dollar(USDG)$1.000.00%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • BittensorBittensor(TAO)$237.76-5.95%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.130.50%
  • pax-goldPAX Gold(PAXG)$4,479.920.06%
  • mantleMantle(MNT)$0.61-3.36%
  • OndoOndo(ONDO)$0.39069110.40%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0603231.68%
  • polkadotPolkadot(DOT)$1.10-4.87%
  • Ripple USDRipple USD(RLUSD)$1.00-0.02%
  • okbOKB(OKB)$84.37-5.25%
  • uniswapUniswap(UNI)$2.83-4.59%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

June 2, 2026
in AI & Technology
Reading Time: 2 mins read
A A
How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp
ShareShareShareShareShare

YOU MAY ALSO LIKE

Enterprise AI agents keep creating data silos. Microsoft’s Build answer is Microsoft IQ and Rayfin.

TinyFish Launches BigSet: An Open-Source Multi-Agent System That Builds Structured Live Datasets from Plain-English Descriptions

print("\n### SECTION D: end-to-end Transformer (vanilla fp32 vs Apex fused + AMP) ###")
VOCAB, D, NHEAD, LAYERS, SEQ, BATCH, STEPS = 2000, 256, 4, 4, 128, 32, 60
class Block(torch.nn.Module):
   def __init__(self, d, nhead, norm_cls):
       super().__init__()
       self.attn = torch.nn.MultiheadAttention(d, nhead, batch_first=True)
       self.ff = torch.nn.Sequential(torch.nn.Linear(d, 4 * d), torch.nn.GELU(),
                                     torch.nn.Linear(4 * d, d))
       self.n1, self.n2 = norm_cls(d), norm_cls(d)
   def forward(self, x):
       h = self.n1(x); x = x + self.attn(h, h, h, need_weights=False)[0]
       return x + self.ff(self.n2(x))
class TinyTransformer(torch.nn.Module):
   def __init__(self, norm_cls):
       super().__init__()
       self.emb = torch.nn.Embedding(VOCAB, D)
       self.blocks = torch.nn.ModuleList([Block(D, NHEAD, norm_cls) for _ in range(LAYERS)])
       self.norm = norm_cls(D)
       self.head = torch.nn.Linear(D, VOCAB)
   def forward(self, idx):
       x = self.emb(idx)
       for b in self.blocks:
           x = b(x)
       return self.head(self.norm(x))
g = torch.Generator(device="cpu").manual_seed(0)
data = torch.randint(0, VOCAB, (BATCH, SEQ + 1), generator=g).to(DEV)
inp, tgt = data[:, :-1], data[:, 1:]
lossfn = torch.nn.CrossEntropyLoss()
def run_training(use_apex):
   torch.manual_seed(0)
   norm_cls = (FusedLayerNorm if (use_apex and HAS_FLN and APEX_OK) else torch.nn.LayerNorm)
   model = TinyTransformer(norm_cls).to(DEV)
   if use_apex and HAS_AMP_C and APEX_OK:
       optimizer = FusedAdam(model.parameters(), lr=3e-4)
   else:
       optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
   scaler = torch.amp.GradScaler("cuda", enabled=use_apex)
   def one_step():
       optimizer.zero_grad(set_to_none=True)
       with torch.amp.autocast("cuda", dtype=torch.float16, enabled=use_apex):
           logits = model(inp)
           loss = lossfn(logits.reshape(-1, VOCAB), tgt.reshape(-1))
       scaler.scale(loss).backward()
       scaler.step(optimizer)
       scaler.update()
       return loss
   for _ in range(5):
       one_step()
   torch.cuda.synchronize()
   t0 = time.perf_counter()
   for _ in range(STEPS):
       loss = one_step()
   torch.cuda.synchronize()
   dt = time.perf_counter() - t0
   return loss.item(), (STEPS * BATCH * SEQ) / dt, dt
loss_v, tps_v, dt_v = run_training(use_apex=False)
print(f"  vanilla (fp32, nn.LayerNorm, AdamW)        : "
     f"{dt_v:5.2f}s  | {tps_v:9.0f} tok/s | final loss {loss_v:.3f}")
if APEX_OK and (HAS_AMP_C or HAS_FLN):
   loss_a, tps_a, dt_a = run_training(use_apex=True)
   print(f"  apex   (fp16, FusedLayerNorm, FusedAdam)   : "
         f"{dt_a:5.2f}s  | {tps_a:9.0f} tok/s | final loss {loss_a:.3f}")
   print(f"  ----> speedup: {tps_a / tps_v:0.2f}x throughput")
else:
   print("  apex path SKIPPED (no fused kernels built)")
print("\n" + "=" * 78)
print("DONE. Key takeaways:")
print("  - FusedAdam/FusedLayerNorm/FusedRMSNorm are the still-relevant Apex pieces;")
print("    speedups grow with model size & parameter count (tiny demo understates it).")
print("  - apex.amp is deprecated -> prefer torch.amp.autocast + torch.amp.GradScaler.")
print("  - FusedAdam composes cleanly with native torch.amp (Section D).")
print("  - On real workloads, also try a larger model and bf16 autocast (no scaler needed).")
print("=" * 78)

Credit: Source link

ShareTweetSendSharePin

Related Posts

Enterprise AI agents keep creating data silos. Microsoft’s Build answer is Microsoft IQ and Rayfin.
AI & Technology

Enterprise AI agents keep creating data silos. Microsoft’s Build answer is Microsoft IQ and Rayfin.

June 2, 2026
TinyFish Launches BigSet: An Open-Source Multi-Agent System That Builds Structured Live Datasets from Plain-English Descriptions
AI & Technology

TinyFish Launches BigSet: An Open-Source Multi-Agent System That Builds Structured Live Datasets from Plain-English Descriptions

June 2, 2026
X Is Now Doing TikTok-Style Reaction Videos
AI & Technology

X Is Now Doing TikTok-Style Reaction Videos

June 2, 2026
AI agents keep giving confident wrong answers. The context layer is enterprise AI’s next production problem.
AI & Technology

AI agents keep giving confident wrong answers. The context layer is enterprise AI’s next production problem.

June 2, 2026
Next Post
Saudi Arabian Crown Prince’s  trillion, desert-city dream in tatters as megaproject halted

Saudi Arabian Crown Prince’s $12 trillion, desert-city dream in tatters as megaproject halted

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Temple housing ‘eternal flame’ burns down in Japan

Temple housing ‘eternal flame’ burns down in Japan

May 31, 2026
She Wants Out Of The House He Shared With His Ex

She Wants Out Of The House He Shared With His Ex

June 2, 2026
California authorities bust burglary ring

California authorities bust burglary ring

May 31, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!