• bitcoinBitcoin(BTC)$81,431.002.55%
  • ethereumEthereum(ETH)$2,296.461.84%
  • tetherTether(USDT)$1.000.02%
  • rippleXRP(XRP)$1.505.90%
  • binancecoinBNB(BNB)$680.431.61%
  • usd-coinUSDC(USDC)$1.00-0.02%
  • solanaSolana(SOL)$92.822.13%
  • tronTRON(TRX)$0.3538531.11%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.04-0.67%
  • dogecoinDogecoin(DOGE)$0.1161783.17%
  • whitebitWhiteBIT Coin(WBT)$59.752.41%
  • USDSUSDS(USDS)$1.000.01%
  • HyperliquidHyperliquid(HYPE)$43.9613.48%
  • cardanoCardano(ADA)$0.2727663.47%
  • leo-tokenLEO Token(LEO)$10.181.34%
  • zcashZcash(ZEC)$551.966.47%
  • bitcoin-cashBitcoin Cash(BCH)$437.200.94%
  • chainlinkChainlink(LINK)$10.574.13%
  • moneroMonero(XMR)$401.372.88%
  • CantonCanton(CC)$0.1634145.77%
  • the-open-networkToncoin(TON)$2.151.96%
  • stellarStellar(XLM)$0.1638423.28%
  • suiSui(SUI)$1.20-0.17%
  • litecoinLitecoin(LTC)$58.523.41%
  • USD1USD1(USD1)$1.000.04%
  • daiDai(DAI)$1.000.00%
  • MemeCoreMemeCore(M)$3.342.30%
  • avalanche-2Avalanche(AVAX)$9.982.57%
  • Ethena USDeEthena USDe(USDE)$1.000.01%
  • hedera-hashgraphHedera(HBAR)$0.0954922.75%
  • shiba-inuShiba Inu(SHIB)$0.0000062.00%
  • RainRain(RAIN)$0.0075750.94%
  • paypal-usdPayPal USD(PYUSD)$1.000.02%
  • crypto-com-chainCronos(CRO)$0.0760511.25%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BittensorBittensor(TAO)$305.253.88%
  • tether-goldTether Gold(XAUT)$4,657.56-0.49%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • uniswapUniswap(UNI)$3.764.00%
  • polkadotPolkadot(DOT)$1.395.05%
  • mantleMantle(MNT)$0.704.19%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0704644.99%
  • pax-goldPAX Gold(PAXG)$4,654.07-0.58%
  • nearNEAR Protocol(NEAR)$1.581.08%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.140.38%
  • OndoOndo(ONDO)$0.3903072.33%
  • Pi NetworkPi Network(PI)$0.1723821.07%
  • okbOKB(OKB)$85.070.61%
  • Falcon USDFalcon USD(USDF)$1.000.01%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Claude Code’s ‘/goals’ separates the agent that works from the one that decides it’s done

May 14, 2026
in AI & Technology
Reading Time: 4 mins read
A A
Claude Code’s ‘/goals’ separates the agent that works from the one that decides it’s done
ShareShareShareShareShare

A code migration agent finishes its run, and the pipeline looks green. But several pieces were never compiled — and it took days to catch. That’s not a model failure; that’s an agent deciding it was done before it actually was.

YOU MAY ALSO LIKE

The ChatGPT Desktop App For Mac Just Got Hit With A Security Breach

A Worthy Rival To Google And Samsung

Many enterprises are now seeing that production AI agent pipelines fail not because of the models’ abilities but because the model behind the agent decides to stop. Several methods to prevent premature task exits are now available from LangChain, Google and OpenAI, though these often rely on separate evaluation systems. The newest method comes from Anthropic: /goals on Claude Code, which formally separates task execution and task evaluation.

Coding agents work in a loop: they read files, run commands, edit code and then check whether the task is done. 

Claude Code /goals essentially adds a second layer to that loop. After a user defines a goal, Claude will continue to turn by turn, but an evaluator model comes in after every step to review and decide if the goal has been achieved. 

The two model split

Orchestration platforms from all three vendors identified the same roadblock. But the way they approach these is different. OpenAI leaves the loop alone and lets the model decide when it’s done, but does let users tag on their own evaluators. For LangGraph and Google’s Agent Development Kit, independent evaluation is possible, but requires developers to define the critic node, write up the termination logic and configure observability. 

Claude Code /goals sets the independent evaluator’s default, whether the user wants it to run longer or shorter. Basically, the developer sets the goal completion condition via a prompt. For example, /goal all tests in test/auth pass, and the lint step is clean. Claude Code then runs, and every time the agent attempts to end its work, the evaluation model, which is Haiku by default, will check against the condition loop. If the condition is not met, the agent keeps running. If the condition is met, then it logs the achieved condition to the agent conversation transcript and clears the goal. There are only two decisions the evaluator makes, which is why the smaller Haiku model works well, whether it’s done or not. 

Claude Code makes this possible by separating the model that attempts to complete a task from the evaluator model that ensures the task is actually completed. This prevents the agent from mixing up what it’s already accomplished with what still needs to be done. With this method, Anthropic noted there’s no need for a third-party observability platform — though enterprises are free to continue using one alongside Claude Code — no need for a custom log, and less reliance on post-mortem reconstruction.

Competitors like Google ADK support similar evaluation patterns. Google ADK deploys a LoopAgent, but developers have to architect that logic.

In its documentation, Anthropic said the most successful conditions usually have: 

  • One measurable end state: a test result, a build exit code, a file count, an empty queue

  • A stated check: how Claude should prove it, such as “npm test exits 0” or “git status is clean.”

  • Constraints that matter: anything that must not change on the way there, such as “no other test file is modified”

Reliability in the loop

For enterprises already managing sprawling tool stacks, the appeal is a native evaluator that doesn’t add another system to maintain.

This is part of a broader trend in the agentic space, especially as the possibility of stateful, long-running and self-learning agents becomes more of a reality. Evaluator models, verification systems and other independent adjudication systems are starting to show up in reasoning systems and, in some cases, in coding agents like Devin or SWE-agent. 

Sean Brownell, solutions director at Sprinklr, told VentureBeat in an email that there is interest in this kind of loop, where the task and judge are separate, but he feels there is nothing unique about Anthropic’s approach.

“Yes, the loop works. Separating the builder from the judge is sound design because, fundamentally, you can’t trust a model to judge its own homework. The model doing the work is the worst judge of whether it’s done,” Brownell said. “That being said, Anthropic isn’t first to market. The most interesting story here is that two of the world’s biggest AI labs shipped the same command just days apart, but each of them reached entirely different conclusions about who gets to declare ‘done.'”

Brownell said the loop works best “for deterministic work with a verifiable end-state like migrations, fixing broken test suites, clearing a backlog,” but for more nuanced tasks or those needing design judgment, a human making that decision is far more important.

Bringing that evaluator/task split to the agent-loop level shows that companies like Anthropic are pushing agents and orchestration further toward a more auditable, observable system.

Credit: Source link

ShareTweetSendSharePin

Related Posts

The ChatGPT Desktop App For Mac Just Got Hit With A Security Breach
AI & Technology

The ChatGPT Desktop App For Mac Just Got Hit With A Security Breach

May 14, 2026
A Worthy Rival To Google And Samsung
AI & Technology

A Worthy Rival To Google And Samsung

May 14, 2026
Enterprises can now train custom AI models from production workflows — no ML team required
AI & Technology

Enterprises can now train custom AI models from production workflows — no ML team required

May 14, 2026
Apple Backs Google After EU Orders Android Be Opened Up To AI Rivals
AI & Technology

Apple Backs Google After EU Orders Android Be Opened Up To AI Rivals

May 14, 2026
Next Post
Mamdani pressures regulators to block 0M Western Union acquisition in latest critics say is anti-business

Mamdani pressures regulators to block $500M Western Union acquisition in latest critics say is anti-business

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
NBC Nightly News Full Episode – May 4

NBC Nightly News Full Episode – May 4

May 9, 2026
Recession Risk: Rising or Overstated? Mark Zandi Answers Rapid-Fire Questions

Recession Risk: Rising or Overstated? Mark Zandi Answers Rapid-Fire Questions

May 11, 2026
Why one of the nation's largest auto lenders isn't worried about high vehicle prices or 'forever loans' – CNBC

Why one of the nation's largest auto lenders isn't worried about high vehicle prices or 'forever loans' – CNBC

May 9, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!