• bitcoinBitcoin(BTC)$77,254.001.51%
  • ethereumEthereum(ETH)$2,282.370.82%
  • tetherTether(USDT)$1.000.01%
  • rippleXRP(XRP)$1.380.11%
  • binancecoinBNB(BNB)$616.91-0.10%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$83.930.79%
  • tronTRON(TRX)$0.3262140.45%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.03-0.24%
  • dogecoinDogecoin(DOGE)$0.1079510.17%
  • whitebitWhiteBIT Coin(WBT)$57.87-0.18%
  • USDSUSDS(USDS)$1.000.01%
  • HyperliquidHyperliquid(HYPE)$40.713.83%
  • leo-tokenLEO Token(LEO)$10.29-0.72%
  • cardanoCardano(ADA)$0.2481840.26%
  • bitcoin-cashBitcoin Cash(BCH)$442.97-0.70%
  • moneroMonero(XMR)$381.661.67%
  • chainlinkChainlink(LINK)$9.160.23%
  • zcashZcash(ZEC)$351.585.82%
  • CantonCanton(CC)$0.150004-0.77%
  • stellarStellar(XLM)$0.159279-0.67%
  • USD1USD1(USD1)$1.000.03%
  • daiDai(DAI)$1.000.05%
  • litecoinLitecoin(LTC)$55.06-1.34%
  • MemeCoreMemeCore(M)$3.15-8.15%
  • avalanche-2Avalanche(AVAX)$9.11-0.93%
  • Ethena USDeEthena USDe(USDE)$1.000.00%
  • hedera-hashgraphHedera(HBAR)$0.087822-0.75%
  • RainRain(RAIN)$0.007847-0.85%
  • shiba-inuShiba Inu(SHIB)$0.000006-0.68%
  • suiSui(SUI)$0.910.45%
  • the-open-networkToncoin(TON)$1.352.63%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • crypto-com-chainCronos(CRO)$0.068278-0.29%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • tether-goldTether Gold(XAUT)$4,568.67-0.91%
  • BittensorBittensor(TAO)$259.233.61%
  • Global DollarGlobal Dollar(USDG)$1.000.01%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • pax-goldPAX Gold(PAXG)$4,566.85-0.96%
  • mantleMantle(MNT)$0.631.06%
  • uniswapUniswap(UNI)$3.21-0.19%
  • polkadotPolkadot(DOT)$1.20-1.30%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.059229-3.09%
  • Pi NetworkPi Network(PI)$0.1798472.48%
  • SkySky(SKY)$0.0799781.53%
  • Falcon USDFalcon USD(USDF)$1.00-0.02%
  • okbOKB(OKB)$83.131.04%
  • AsterAster(ASTER)$0.65-0.96%
  • nearNEAR Protocol(NEAR)$1.30-2.28%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution

April 8, 2026
in AI & Technology
Reading Time: 5 mins read
A A
Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution
ShareShareShareShareShare

Z.AI, the AI platform developed by the team behind the GLM model family, has released GLM-5.1 — its next-generation flagship model developed specifically for agentic engineering. Unlike models optimized for clean, single-turn benchmarks, GLM-5.1 is built for agentic tasks, with significantly stronger coding capabilities than its predecessor, and achieves state-of-the-art performance on SWE-Bench Pro while leading GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

Architecture: DSA, MoE, and Asynchronous RL

Before diving into what GLM-5.1 can do, it’s worth understanding what it’s built on — because the architecture is meaningfully different from a standard dense transformer.

YOU MAY ALSO LIKE

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes

Meta Says It May Withdraw Its Apps From New Mexico If Judge Agrees To The State’s Demands

GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. The model uses a glm_moe_dsa architecture (Mixture of Experts (MoE) model combined with DSA). For AI devs evaluating whether to self-host, this matters: MoE models activate only a subset of their parameters per forward pass, which can make inference significantly more efficient than a comparably-sized dense model, though they require specific serving infrastructure.

On the training side, GLM-5 implements a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupling generation from training. Novel asynchronous agent RL algorithms further improve RL quality, enabling the model to learn from complex, long-horizon interactions more effectively. This is what allows the model to handle agentic tasks with the kind of sustained judgment that single-turn RL training struggles to produce.

The Plateau Problem GLM-5.1 is Solving

To understand what makes GLM-5.1 different at inference time, it helps to understand a specific failure mode in LLMs used as agents. Previous models — including GLM-5 — tend to exhaust their repertoire early: they apply familiar techniques for quick initial gains, then plateau. Giving them more time doesn’t help.

This is a structural limitation for any developer trying to use an LLM as a coding agent. The model applies the same playbook it knows, hits a wall, and stops making progress regardless of how long it runs. GLM-5.1, by contrast, is built to stay effective on agentic tasks over much longer horizons. The model handles ambiguous problems with better judgment and stays productive over longer sessions. It breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision. By revisiting its reasoning and revising its strategy through repeated iteration, GLM-5.1 sustains optimization over hundreds of rounds and thousands of tool calls.

The sustained performance requires more than a larger context window. This capability requires the model to maintain goal alignment over extended execution, reducing strategy drift, error accumulation, and ineffective trial and error, enabling truly autonomous execution for complex engineering tasks.

Benchmarks: Where GLM-5.1 Stands

On SWE-Bench Pro, GLM-5.1 achieves a score of 58.4, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, setting a new state-of-the-art result.

The broader benchmark profile shows a well-rounded model. GLM-5.1 scores 95.3 on AIME 2026, 94.0 on HMMT Nov. 2025, 82.6 on HMMT Feb. 2026, and 86.2 on GPQA-Diamond — a graduate-level science reasoning benchmark. On agentic and tool-use benchmarks, GLM-5.1 scores 68.7 on CyberGym (a substantial jump from GLM-5’s 48.3), 68.0 on BrowseComp, 70.6 on τ³-Bench, and 71.8 on MCP-Atlas (Public Set) — the last one particularly relevant given MCP’s growing role in production agent systems. On Terminal-Bench 2.0, the model scores 63.5, rising to 66.5 when evaluated with Claude Code as the scaffolding.

Across 12 representative benchmarks covering reasoning, coding, agents, tool use, and browsing, GLM-5.1 demonstrates a broad and well-balanced capability profile. This shows that GLM-5.1 is not a single-metric improvement — it advances simultaneously across general intelligence, real-world coding, and complex task execution.

In terms of overall positioning, GLM-5.1’s general capability and coding performance are overall aligned with Claude Opus 4.6.

8-Hour Sustained Execution: What That Actually Means

The most important difference in GLM-5.1 is its capacity for long-horizon task execution. GLM-5.1 can work autonomously on a single task for up to 8 hours, completing the full process from planning and execution to testing, fixing, and delivery.

For developers building autonomous agents, this changes the scope of what’s possible. Rather than orchestrating a model over dozens of short-lived tool calls, you can hand GLM-5.1 a complex objective and let it run a complete ‘experiment–analyze–optimize’ loop autonomously.

The concrete engineering demonstrations make this tangible: GLM-5.1 can build a complete Linux desktop environment from scratch in 8 hours; perform 178 rounds of autonomous iteration on a vector database task and improve performance to 1.5× the initial version; and optimize a CUDA kernel, increasing speedup from 2.6× to 35.7× through sustained tuning.

That CUDA kernel result is notable for ML engineers: improving a kernel from 2.6× to 35.7× speedup through autonomous iterative optimization is a level of depth that would take a skilled human engineer significant time to replicate manually.

Model Specifications and Deployment

GLM-5.1 is a 754-billion-parameter MoE model released under the MIT license on HuggingFace. It operates with a 200K context window and supports up to 128K maximum output tokens — both important for long-horizon tasks that need to hold large codebases or extended reasoning chains in memory.

GLM-5.1 supports thinking mode (offering multiple thinking modes for different scenarios), streaming output, function calling, context caching, structured output, and MCP for integrating external tools and data sources.

For local deployment, the following open-source frameworks support GLM-5.1: SGLang (v0.5.10+), vLLM (v0.19.0+), xLLM (v0.8.0+), Transformers (v0.5.3+), and KTransformers (v0.5.3+).

For API access, the model is available through Z.AI’s API platform. Getting started requires installing zai-sdk via pip and initializing a ZaiClient with your API key. .

Key Takeaways

  • GLM-5.1 sets a new state-of-the-art on SWE-Bench Pro with a score of 58.4, outperforming GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro — making it one of the the strongest publicly benchmarked model for real-world software engineering tasks at the time of release.
  • The model is built for long-horizon autonomous execution, capable of working on a single complex task for up to 8 hours — running experiments, revising strategies, and iterating across hundreds of rounds and thousands of tool calls without human intervention.
  • GLM-5.1 uses a MoE + DSA architecture trained with asynchronous reinforcement learning, which reduces training and inference costs compared to dense transformers while maintaining long-context fidelity — a meaningful consideration for teams evaluating self-hosting.
  • It is open-weight under the MIT license (754B parameters, 200K context window, 128K max output tokens) and supports local deployment via SGLang, vLLM, xLLM, Transformers, and KTransformers, as well as API access through the Z.AI platform with OpenAI SDK compatibility.
  • GLM-5.1 goes beyond coding — it also shows strong improvements in front-end prototyping, artifacts generation, and office productivity tasks (Word, Excel, PowerPoint, PDF), positioning it as a general-purpose foundation for both agentic systems and high-quality content workflows.

Check out the Weights, API and Technical details.  Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us


Credit: Source link

ShareTweetSendSharePin

Related Posts

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes
AI & Technology

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes

May 1, 2026
Meta Says It May Withdraw Its Apps From New Mexico If Judge Agrees To The State’s Demands
AI & Technology

Meta Says It May Withdraw Its Apps From New Mexico If Judge Agrees To The State’s Demands

April 30, 2026
Alibaba’s Metis agent cuts redundant AI tool calls from 98% to 2% — and gets more accurate doing it
AI & Technology

Alibaba’s Metis agent cuts redundant AI tool calls from 98% to 2% — and gets more accurate doing it

April 30, 2026
Modder Releases Loader To Turn The PS5 Into A Linux System
AI & Technology

Modder Releases Loader To Turn The PS5 Into A Linux System

April 30, 2026
Next Post
Dow futures jump over 1,000 points, oil tumbles after Trump suspends Iran attacks for two weeks: Live updates – CNBC

Dow futures jump over 1,000 points, oil tumbles after Trump suspends Iran attacks for two weeks: Live updates - CNBC

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Two-year-old boy celebrates being cancer free with help from a plane full of passengers

Two-year-old boy celebrates being cancer free with help from a plane full of passengers

April 28, 2026
U.S. Centcom says blockade of Iranian ports begins Monday

U.S. Centcom says blockade of Iranian ports begins Monday

April 26, 2026
Mid-America Apartment Communities, Inc. (MAA) Q1 2026 Earnings Call Transcript

Mid-America Apartment Communities, Inc. (MAA) Q1 2026 Earnings Call Transcript

April 30, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!