• bitcoinBitcoin(BTC)$60,021.00-1.51%
  • ethereumEthereum(ETH)$1,574.69-2.71%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$561.70-0.40%
  • usd-coinUSDC(USDC)$1.000.00%
  • rippleXRP(XRP)$1.05-2.54%
  • solanaSolana(SOL)$68.230.41%
  • tronTRON(TRX)$0.323754-0.99%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.60%
  • HyperliquidHyperliquid(HYPE)$65.151.97%
  • dogecoinDogecoin(DOGE)$0.075078-1.14%
  • USDSUSDS(USDS)$1.000.00%
  • RainRain(RAIN)$0.015786-0.44%
  • leo-tokenLEO Token(LEO)$9.420.05%
  • zcashZcash(ZEC)$418.760.51%
  • stellarStellar(XLM)$0.179153-3.43%
  • CantonCanton(CC)$0.149877-0.83%
  • moneroMonero(XMR)$309.86-1.67%
  • whitebitWhiteBIT Coin(WBT)$48.63-1.72%
  • LABLAB(LAB)$18.1311.93%
  • chainlinkChainlink(LINK)$7.28-1.86%
  • cardanoCardano(ADA)$0.144038-2.34%
  • USD1USD1(USD1)$1.00-0.03%
  • daiDai(DAI)$1.00-0.01%
  • Ethena USDeEthena USDe(USDE)$1.000.00%
  • the-open-networkGram (prev. Toncoin)(GRAM)$1.57-1.69%
  • bitcoin-cashBitcoin Cash(BCH)$192.971.76%
  • hedera-hashgraphHedera(HBAR)$0.073565-2.52%
  • litecoinLitecoin(LTC)$41.100.49%
  • Circle USYCCircle USYC(USYC)$1.13-0.01%
  • Global DollarGlobal Dollar(USDG)$1.00-0.02%
  • suiSui(SUI)$0.68-0.23%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • avalanche-2Avalanche(AVAX)$6.25-2.93%
  • crypto-com-chainCronos(CRO)$0.054905-3.02%
  • shiba-inuShiba Inu(SHIB)$0.000004-2.94%
  • tether-goldTether Gold(XAUT)$4,013.530.10%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • nearNEAR Protocol(NEAR)$1.85-5.72%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.02%
  • BittensorBittensor(TAO)$213.46-2.74%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0582501.54%
  • pax-goldPAX Gold(PAXG)$4,018.360.25%
  • uniswapUniswap(UNI)$2.90-0.88%
  • worldcoin-wldWorldcoin(WLD)$0.496317-7.09%
  • AsterAster(ASTER)$0.620.96%
  • okbOKB(OKB)$75.710.51%
  • Ripple USDRipple USD(RLUSD)$1.00-0.02%
  • OndoOndo(ONDO)$0.3141060.22%
  • HTX DAOHTX DAO(HTX)$0.000002-1.24%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Alibaba’s model never trained as an agent — and improved agent performance across seven benchmarks

June 24, 2026
in AI & Technology
Reading Time: 5 mins read
A A
Alibaba’s model never trained as an agent — and improved agent performance across seven benchmarks
ShareShareShareShareShare

Alibaba’s Qwen team released Qwen-AgentWorld on Tuesday — two models trained not to act inside agent environments, but to predict what those environments return. The release covers seven domains under a single architecture: MCP, Search, Terminal, Software Engineering, Android, Web, and OS.

The release extends Alibaba’s recent push into autonomous agents. Qwen3.7-Max, released in May, was built around a 35-hour autonomous execution capability.

YOU MAY ALSO LIKE

OpenAI Will Initially Only Release ChatGPT 5.6 To Government-Approved Customers

As Everything Gets More Expensive, It’s Time To Make Do And Mend

That shift targets a ceiling teams training agents at scale run into directly. Real search engines surface whatever results exist, with no mechanism to inject controlled conditions. Live terminals do not allow injecting a low-disk-space condition on demand. Agent training is bounded by what production environments will surface, with no systematic way to expose the edge cases agents will need to handle but rarely encounter in training.

The research team trained agents inside the resulting simulator and found performance gains that exceeded what training against real environments alone produced. In a separate test, using world model training as a warm-up before agentic fine-tuning improved performance across seven benchmarks, including three the model had never seen during training.

The paper accompanying the release identified a gap in prior agent research. “We argue that world modeling is a crucial missing piece in the path to general agents.”

Qwen-AgentWorld trains on what environments return, not what agents should do

Most agent models are trained to answer one question: given what the environment just showed me, what should I do next? Qwen-AgentWorld is trained to answer the inverse: given what the agent just did, what will the environment show next?

That reversal is the core of what the paper calls a language world model: instead of optimizing for action selection, the model learns to predict the next environment state across all seven domains under a single training objective. Prior work was narrower: WebWorld, an earlier Qwen project from February, covered web environments only; Snowflake’s Agent World Model, published the same month, generates code-driven SQL-backed environments rather than training a model to predict states. Qwen-AgentWorld is the first to span seven domains in a single model, with environment modeling baked in from the earliest pretraining stage.

Alibaba trained both models in three stages on more than 10 million environment interaction trajectories from real agent runs. Stage one teaches the model how environments behave — file systems, terminal states, browser DOM changes, API responses. Stage two trains the model to reason through what comes next before predicting it. Stage three, reinforcement learning, tightens predictions using rule-based checks and open-ended quality scoring.

Both models are Mixture-of-Experts designs — only a fraction of parameters are active per token. The 35B model activates 3B; the 397B activates 17B. Both support 256K context windows. For GUI domains (Android, Web, and OS), the models work from textual accessibility trees and UI view hierarchies rather than screenshots.

The 35B model weights and AgentWorldBench are available under Apache 2.0; the 397B weights are not publicly released.

The training results matter more than the benchmarks

The benchmark scores show how accurately the models predict what environments return. The training results show what that prediction capability is actually worth for teams building agents — and those are the numbers that matter more.

According to the researchers, agents trained inside controlled simulation outperformed agents trained in real environments. Injecting targeted perturbations — partial responses that force extra agent steps, and edge cases real environments rarely surface — pushed MCPMark from 24.6 to 33.8. On Search, agents trained in entirely fictional worlds transferred to real search tasks, pushing WideSearch F1 Item from 34.02 to 50.31 on the open 35B model. A separate warm-up test showed that world model pretraining improved BFCL v4 from 62.29 to 71.25 and Claw-Eval from 53.60 to 64.88 with no agent-specific fine-tuning.

Credit: Alibaba https://arxiv.org/pdf/2606.24597

Researchers flag the benchmark and the overfitting risk

The paper drew immediate reaction from AI researchers on X. The concerns they raised map to what practitioners need to verify before acting on the findings.

On the training objective and transfer result, the assessment from one AI/ML researcher was direct. “Every other ‘agent’ model has been trained to act in environments,” wrote @drawais_ai, who has a PhD background and regularly breaks down AI papers. “Qwen flipped the question. They trained the model to predict the environment itself… That predictive knowledge then transfers to agent tasks even without any agent-specific fine-tuning.” He identified the Controllable Sim RL result as “the receipt” for the claim that synthetic training can substitute for real-environment RL at scale, and flagged that three of the seven transfer benchmarks were entirely out of domain.

The benchmark margin drew immediate scrutiny. “AgentWorldBench is a benchmark Alibaba built and published in the same paper,” wrote @TheSignal_Desk, who focuses on honest takes and key numbers in AI research. “They wrote the test, then topped it by 0.46.”

The sim-RL methodology is the result @limalemonnn, who builds production AI agents, identified as most in need of scrutiny before the headline claim gets quoted. “Sim-trained agents traditionally overfit to the simulator’s quirks,” they wrote. “If the world model is too clean, the agent learns the model, not the task.” They pointed to the paper’s holdout split as the section practitioners should read before acting on the numbers.

The overfitting concern has a partial answer in the data. The gap between uncontrolled Sim RL (MCPMark 24.6) and controlled Sim RL (MCPMark 33.8) suggests the gains depend substantially on the controllability mechanism, not simulation accuracy alone. The fictional-world Search result, where agents trained on invented environments transfer to real search tasks, is the paper’s strongest evidence against the overfitting concern.

What this means for teams building agentic pipelines

For AI engineering teams building and scaling agentic pipelines, this work signals a meaningful shift in how agent capability gets built. Teams training agents at scale now have a third option between real-environment RL and static benchmarks: controlled simulation that injects the edge cases production won’t surface.

Synthetic environments are a legitimate training layer. Controlled simulation that injects conditions real environments won’t produce is a complement to real-environment RL, not a shortcut around it.

What a model learns before agent training starts matters more than most pipelines account for. The warm-up finding — performance gains across unseen benchmarks with no agent-specific training — suggests environment grounding belongs earlier in development than current practice.

Credit: Source link

ShareTweetSendSharePin

Related Posts

OpenAI Will Initially Only Release ChatGPT 5.6 To Government-Approved Customers
AI & Technology

OpenAI Will Initially Only Release ChatGPT 5.6 To Government-Approved Customers

June 25, 2026
As Everything Gets More Expensive, It’s Time To Make Do And Mend
AI & Technology

As Everything Gets More Expensive, It’s Time To Make Do And Mend

June 25, 2026
DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds
AI & Technology

DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds

June 25, 2026
OpenAI’s updated GPT-5.5 Instant is better at shopping, complex constraints, and understanding user intent  — and it’s already in the API
AI & Technology

OpenAI’s updated GPT-5.5 Instant is better at shopping, complex constraints, and understanding user intent  — and it’s already in the API

June 25, 2026
Next Post
Bank of America CEO Brian Moynihan on how Americans are spending their money

Bank of America CEO Brian Moynihan on how Americans are spending their money

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Trump-endorsed Barry Moore wins Republican primary runoff in Alabama Senate race

Trump-endorsed Barry Moore wins Republican primary runoff in Alabama Senate race

June 23, 2026
How To Scale Out

How To Scale Out

June 25, 2026
Non-Profit creates community through vinyl record listening sessions

Non-Profit creates community through vinyl record listening sessions

June 20, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!