• bitcoinBitcoin(BTC)$62,443.00-3.33%
  • ethereumEthereum(ETH)$1,658.24-6.12%
  • tetherTether(USDT)$1.00-0.01%
  • binancecoinBNB(BNB)$573.24-3.92%
  • usd-coinUSDC(USDC)$1.000.01%
  • rippleXRP(XRP)$1.11-3.41%
  • solanaSolana(SOL)$69.11-6.79%
  • tronTRON(TRX)$0.329484-0.50%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.041.48%
  • HyperliquidHyperliquid(HYPE)$63.13-7.32%
  • dogecoinDogecoin(DOGE)$0.079418-5.69%
  • USDSUSDS(USDS)$1.00-0.01%
  • RainRain(RAIN)$0.0157819.36%
  • leo-tokenLEO Token(LEO)$9.54-0.17%
  • zcashZcash(ZEC)$424.34-6.93%
  • stellarStellar(XLM)$0.193487-9.44%
  • whitebitWhiteBIT Coin(WBT)$50.84-4.08%
  • moneroMonero(XMR)$318.17-3.99%
  • CantonCanton(CC)$0.1508320.00%
  • cardanoCardano(ADA)$0.153307-5.17%
  • chainlinkChainlink(LINK)$7.60-5.76%
  • LABLAB(LAB)$16.187.59%
  • USD1USD1(USD1)$1.00-0.05%
  • daiDai(DAI)$1.00-0.01%
  • Ethena USDeEthena USDe(USDE)$1.00-0.03%
  • the-open-networkGram (prev. Toncoin)(GRAM)$1.57-5.44%
  • bitcoin-cashBitcoin Cash(BCH)$190.92-5.04%
  • MemeCoreMemeCore(M)$2.89-0.73%
  • hedera-hashgraphHedera(HBAR)$0.077564-3.00%
  • litecoinLitecoin(LTC)$43.44-3.93%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • suiSui(SUI)$0.71-3.87%
  • Global DollarGlobal Dollar(USDG)$1.000.02%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • avalanche-2Avalanche(AVAX)$6.23-1.96%
  • shiba-inuShiba Inu(SHIB)$0.000005-4.33%
  • nearNEAR Protocol(NEAR)$2.01-7.28%
  • crypto-com-chainCronos(CRO)$0.056511-4.60%
  • tether-goldTether Gold(XAUT)$4,105.95-1.98%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.140.37%
  • BittensorBittensor(TAO)$220.99-5.60%
  • worldcoin-wldWorldcoin(WLD)$0.57-9.47%
  • pax-goldPAX Gold(PAXG)$4,111.27-2.05%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.057697-2.49%
  • uniswapUniswap(UNI)$2.88-5.44%
  • mantleMantle(MNT)$0.52-3.88%
  • AsterAster(ASTER)$0.62-3.37%
  • okbOKB(OKB)$77.741.85%
  • Ripple USDRipple USD(RLUSD)$1.00-0.01%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads

June 23, 2026
in AI & Technology
Reading Time: 12 mins read
A A
Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads
ShareShareShareShareShare

Prime Intellect has released prime-rl version 0.6.0. The framework targets reinforcement learning on trillion-parameter Mixture-of-Experts (MoE) models. It focuses on heavy agentic workloads, like long-horizon software-engineering tasks.

The research team trained GLM-5 on SWE tasks at up to 131k sequence length. Step times stayed under five minutes. The batch size was 256 rollouts. The run used only 28 H200 nodes.

YOU MAY ALSO LIKE

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval

Meta Is ‘Pausing’ Employee Tracking Program After It Let The Whole Company See Sensitive Data

TL;DR

  • prime-rl 0.6.0 trains trillion-parameter MoE models on agentic RL workloads.
  • GLM-5 trained on SWE at 131k sequence length, sub-5-minute steps, 28 H200 nodes.
  • Asynchronous RL disaggregates trainer and inference for independent optimization.
  • Inference uses FP8, Wide EP, P/D disaggregation, KV offloading, and router replay.
  • Training uses 3-D parallelism (FSDP, EP, CP) plus block-scaled FP8.

What is prime-rl 0.6.0?

prime-rl is an open framework for asynchronous reinforcement learning. It post-trains large open-source models on agentic tasks. Version 0.6.0 extends this to trillion-parameter MoE scale.

The example model in the announcement is zai-org/GLM-5.1. The optimizations also apply to other large MoE models. Examples include moonshotai/Kimi-K2.7-Code and nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16.

A full GLM-5.1 run starts with one command on a Slurm cluster.

uv run rl @ examples/glm5_llmd/rl.toml --output-dir /shared/outputs/glm5-llmd

Role of asynchronous RL

Agentic tasks have long-tail outliers. Some coding rollouts run for hours. Waiting for them before each policy update would idle GPUs.

Asynchronous RL avoids this. The trainer and inference systems are disaggregated. They run and scale independently. The inference policy updates as soon as the optimizer step finishes.

There is one synchronization point: the policy update. prime-rl pushes new weights as soon as they exist. Already-dispatched rollouts keep their active prefix cache. So a single rollout may mix tokens from several policy versions.

New rollouts behave differently. They repopulate their own KV cache, even when prefixes match. A KV-cache salt forces this. Requests from too old a policy are dropped. The max_off_policy_steps value controls that threshold.

Inference optimizations

Inference is usually the throughput bottleneck in an RL system. prime-rl optimizes for throughput, while keeping latency bounded.

FP8 inference: Lower precision speeds up prefill and decode. prime-rl uses FP8 with DeepEP and DeepGEMM kernels.

Wide Expert Parallelism: Wide EP spreads experts across ≥32 GPUs. It pairs with a large data-parallel rank, for example 32. Each GPU holds separate experts and serves as an endpoint. Synchronization happens per-layer, through dispatch and combine operations.

Prefill and Decode Disaggregation: Some model↔env pairs hit a 4:1 prefill:decode token ratio. Shared workers would inflate end-to-end latency. That reduces the benefits of PipelineRL. P/D disaggregation separates prefill and decode workers. Long tool outputs then stop throttling decode workers.

KV cache management: High concurrency needs large KV cache space. prime-rl supports tiered offloading to CPU and disk. vLLM native offloading creates one pool per worker. Mooncake Store instead pools RAM and disk across all nodes centrally.

Request routing: prime-rl ships a fork of vllm-router by default. It also supports the NVIDIA Dynamo router as a drop-in. Routers score workers using KV cache reuse, queue depth, and live load.

Router replay (R3): Trainer↔inference mismatch silently kills training. Router replay captures inference routing decisions. It replays them directly on the trainer. This cuts KL mismatch by roughly an order of magnitude. Routed experts have shape [num_layers, top_k, seq_len]. This payload can grow to hundreds of GB. At scale, the data rate reaches tens of Gbps. So prime-rl treats it as an opaque payload. Optimized PyTorch operations handle the processing.

Training optimizations

The trainer builds on torchtitan, a PyTorch-native training codebase. It relies on 3-D parallelism: FSDP, CP, and EP. The GLM-5 case study uses all three.

Strategy What it shards Primary use Key detail
FSDP (FSDP2) Parameters, gradients, optimizer states Baseline memory amortization Gathers weights on demand per layer via fully_shard
Expert Parallelism (EP) Experts within a layer Shrinks active layer memory all2all dispatch/combine; torch-native or DeepEP
Context Parallelism (CP) The sequence dimension Long-context activation memory Ulysses (default) or Ring Attention

EP exists because layers stay huge after FSDP. With 78 layers and 800B params in float32, one layer’s all-gather needs roughly 40GB. Overlapping one layer pushes that near 80GB. Setting EP=8 dispatches tokens instead of gathering full experts. torch-native all2all is slightly faster within one node. DeepEP wins when EP spans multiple nodes.

CP matters at 131k+ sequence length. There, activations dominate memory, not parameters. GLM-5 uses DSA, which neither Ulysses nor Ring Attention parallelizes directly. So prime-rl ships a custom context-parallel implementation for it.

FP8 training. prime-rl uses DeepGEMM block-scaled FP8, as proposed by DeepSeek V3. This rarely raises throughput, due to quantization overhead. Its real value is matching trainer and inference precision. That reduces KL mismatch and stabilizes training.

Interactive Explainer

Use cases with examples

  • Long-horizon SWE agents: Train a model on real repository issues. Rollouts can span 100s of turns and tool calls. P/D disaggregation keeps decode latency predictable here.
  • 1T-scale post-training on fewer nodes: The GLM-5 run fit on 28 H200 nodes. Wide EP and KV offloading raise concurrency and throughput.
  • Stable agentic RL at scale: Router replay and FP8 training both reduce trainer↔inference KL mismatch. Lower mismatch means steadier training.

Check out the Technical details. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us


Credit: Source link

ShareTweetSendSharePin

Related Posts

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval
AI & Technology

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval

June 23, 2026
Meta Is ‘Pausing’ Employee Tracking Program After It Let The Whole Company See Sensitive Data
AI & Technology

Meta Is ‘Pausing’ Employee Tracking Program After It Let The Whole Company See Sensitive Data

June 22, 2026
xAI Launches /goal in Grok Build, Adding Long-Running Autonomous Execution With Built-In Verification for Multi-Step Coding Tasks
AI & Technology

xAI Launches /goal in Grok Build, Adding Long-Running Autonomous Execution With Built-In Verification for Multi-Step Coding Tasks

June 22, 2026
Alibaba’s AI video model rises to No. 2 in global rankings, as OpenAI’s Sora and ByteDance’s Seedance fall away
AI & Technology

Alibaba’s AI video model rises to No. 2 in global rankings, as OpenAI’s Sora and ByteDance’s Seedance fall away

June 22, 2026
Next Post
Rep. Mike Collins wins GOP runoff in Georgia Senate race, NBC News projects

Rep. Mike Collins wins GOP runoff in Georgia Senate race, NBC News projects

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Her Husband Drained Their Retirement Savings

Her Husband Drained Their Retirement Savings

June 18, 2026
Trump signs Iran memorandum of understanding

Trump signs Iran memorandum of understanding

June 22, 2026
ALERT: Trump Iran Deal Officially Signed!

ALERT: Trump Iran Deal Officially Signed!

June 18, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!