• bitcoinBitcoin(BTC)$60,924.000.05%
  • ethereumEthereum(ETH)$1,562.22-3.25%
  • tetherTether(USDT)$1.000.03%
  • binancecoinBNB(BNB)$576.41-1.73%
  • usd-coinUSDC(USDC)$1.000.01%
  • rippleXRP(XRP)$1.11-0.53%
  • solanaSolana(SOL)$62.78-3.22%
  • tronTRON(TRX)$0.320502-1.25%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.96%
  • HyperliquidHyperliquid(HYPE)$58.73-6.20%
  • dogecoinDogecoin(DOGE)$0.082250-1.02%
  • USDSUSDS(USDS)$1.00-0.02%
  • leo-tokenLEO Token(LEO)$9.55-2.42%
  • RainRain(RAIN)$0.012932-3.46%
  • stellarStellar(XLM)$0.2027655.68%
  • CantonCanton(CC)$0.1554726.79%
  • cardanoCardano(ADA)$0.160740-0.80%
  • zcashZcash(ZEC)$352.715.39%
  • moneroMonero(XMR)$297.06-9.15%
  • chainlinkChainlink(LINK)$7.420.00%
  • whitebitWhiteBIT Coin(WBT)$43.290.29%
  • USD1USD1(USD1)$1.000.05%
  • Ethena USDeEthena USDe(USDE)$1.000.00%
  • bitcoin-cashBitcoin Cash(BCH)$218.71-0.50%
  • daiDai(DAI)$1.00-0.02%
  • the-open-networkToncoin(TON)$1.604.51%
  • MemeCoreMemeCore(M)$2.82-7.56%
  • hedera-hashgraphHedera(HBAR)$0.079153-1.40%
  • litecoinLitecoin(LTC)$42.68-1.38%
  • LABLAB(LAB)$9.52-20.10%
  • avalanche-2Avalanche(AVAX)$6.79-4.06%
  • suiSui(SUI)$0.722.01%
  • paypal-usdPayPal USD(PYUSD)$1.000.00%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • shiba-inuShiba Inu(SHIB)$0.000005-1.48%
  • tether-goldTether Gold(XAUT)$4,286.06-1.14%
  • crypto-com-chainCronos(CRO)$0.0585120.05%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • nearNEAR Protocol(NEAR)$1.86-8.19%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.72%
  • pax-goldPAX Gold(PAXG)$4,295.62-1.30%
  • BittensorBittensor(TAO)$196.15-0.39%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0564571.71%
  • mantleMantle(MNT)$0.52-1.71%
  • Ripple USDRipple USD(RLUSD)$1.000.02%
  • polkadotPolkadot(DOT)$0.96-2.30%
  • OndoOndo(ONDO)$0.328307-6.55%
  • AsterAster(ASTER)$0.62-5.80%
  • HTX DAOHTX DAO(HTX)$0.000002-1.99%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Claude Code’s ‘/goals’ separates the agent that works from the one that decides it’s done

May 14, 2026
in AI & Technology
Reading Time: 4 mins read
A A
Claude Code’s ‘/goals’ separates the agent that works from the one that decides it’s done
ShareShareShareShareShare

A code migration agent finishes its run, and the pipeline looks green. But several pieces were never compiled — and it took days to catch. That’s not a model failure; that’s an agent deciding it was done before it actually was.

YOU MAY ALSO LIKE

Control Resonant’s Take On New York Feels Like The Backrooms

Google Will Pay SpaceX $920 Million A Month To Use xAI’s Data Centers

Many enterprises are now seeing that production AI agent pipelines fail not because of the models’ abilities but because the model behind the agent decides to stop. Several methods to prevent premature task exits are now available from LangChain, Google and OpenAI, though these often rely on separate evaluation systems. The newest method comes from Anthropic: /goals on Claude Code, which formally separates task execution and task evaluation.

Coding agents work in a loop: they read files, run commands, edit code and then check whether the task is done. 

Claude Code /goals essentially adds a second layer to that loop. After a user defines a goal, Claude will continue to turn by turn, but an evaluator model comes in after every step to review and decide if the goal has been achieved. 

The two model split

Orchestration platforms from all three vendors identified the same roadblock. But the way they approach these is different. OpenAI leaves the loop alone and lets the model decide when it’s done, but does let users tag on their own evaluators. For LangGraph and Google’s Agent Development Kit, independent evaluation is possible, but requires developers to define the critic node, write up the termination logic and configure observability. 

Claude Code /goals sets the independent evaluator’s default, whether the user wants it to run longer or shorter. Basically, the developer sets the goal completion condition via a prompt. For example, /goal all tests in test/auth pass, and the lint step is clean. Claude Code then runs, and every time the agent attempts to end its work, the evaluation model, which is Haiku by default, will check against the condition loop. If the condition is not met, the agent keeps running. If the condition is met, then it logs the achieved condition to the agent conversation transcript and clears the goal. There are only two decisions the evaluator makes, which is why the smaller Haiku model works well, whether it’s done or not. 

Claude Code makes this possible by separating the model that attempts to complete a task from the evaluator model that ensures the task is actually completed. This prevents the agent from mixing up what it’s already accomplished with what still needs to be done. With this method, Anthropic noted there’s no need for a third-party observability platform — though enterprises are free to continue using one alongside Claude Code — no need for a custom log, and less reliance on post-mortem reconstruction.

Competitors like Google ADK support similar evaluation patterns. Google ADK deploys a LoopAgent, but developers have to architect that logic.

In its documentation, Anthropic said the most successful conditions usually have: 

  • One measurable end state: a test result, a build exit code, a file count, an empty queue

  • A stated check: how Claude should prove it, such as “npm test exits 0” or “git status is clean.”

  • Constraints that matter: anything that must not change on the way there, such as “no other test file is modified”

Reliability in the loop

For enterprises already managing sprawling tool stacks, the appeal is a native evaluator that doesn’t add another system to maintain.

This is part of a broader trend in the agentic space, especially as the possibility of stateful, long-running and self-learning agents becomes more of a reality. Evaluator models, verification systems and other independent adjudication systems are starting to show up in reasoning systems and, in some cases, in coding agents like Devin or SWE-agent. 

Sean Brownell, solutions director at Sprinklr, told VentureBeat in an email that there is interest in this kind of loop, where the task and judge are separate, but he feels there is nothing unique about Anthropic’s approach.

“Yes, the loop works. Separating the builder from the judge is sound design because, fundamentally, you can’t trust a model to judge its own homework. The model doing the work is the worst judge of whether it’s done,” Brownell said. “That being said, Anthropic isn’t first to market. The most interesting story here is that two of the world’s biggest AI labs shipped the same command just days apart, but each of them reached entirely different conclusions about who gets to declare ‘done.'”

Brownell said the loop works best “for deterministic work with a verifiable end-state like migrations, fixing broken test suites, clearing a backlog,” but for more nuanced tasks or those needing design judgment, a human making that decision is far more important.

Bringing that evaluator/task split to the agent-loop level shows that companies like Anthropic are pushing agents and orchestration further toward a more auditable, observable system.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Control Resonant’s Take On New York Feels Like The Backrooms
AI & Technology

Control Resonant’s Take On New York Feels Like The Backrooms

June 6, 2026
Google Will Pay SpaceX 0 Million A Month To Use xAI’s Data Centers
AI & Technology

Google Will Pay SpaceX $920 Million A Month To Use xAI’s Data Centers

June 6, 2026
Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents
AI & Technology

Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents

June 6, 2026
NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time
AI & Technology

NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time

June 6, 2026
Next Post
Noble Audio To Launch FoKus Apollo Pro Headset For 9

Noble Audio To Launch FoKus Apollo Pro Headset For $699

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Fox News star Peter Doocy announces wife Hillary Vaughn is pregnant with third child

Fox News star Peter Doocy announces wife Hillary Vaughn is pregnant with third child

June 1, 2026
Six takeaways from Howie Roseman's post-A.J. Brown trade press conference – PhillyVoice

Six takeaways from Howie Roseman's post-A.J. Brown trade press conference – PhillyVoice

June 2, 2026
Meta Will Reportedly Let Employees Take 30-Minute Breaks From Its Tracking Program

Meta Will Reportedly Let Employees Take 30-Minute Breaks From Its Tracking Program

June 3, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!