• bitcoinBitcoin(BTC)$70,453.00-2.77%
  • ethereumEthereum(ETH)$2,064.27-3.03%
  • tetherTether(USDT)$1.00-0.01%
  • binancecoinBNB(BNB)$650.59-2.50%
  • rippleXRP(XRP)$1.39-3.08%
  • usd-coinUSDC(USDC)$1.000.00%
  • solanaSolana(SOL)$86.66-3.54%
  • tronTRON(TRX)$0.2940131.62%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.021.40%
  • dogecoinDogecoin(DOGE)$0.094148-4.76%
  • whitebitWhiteBIT Coin(WBT)$55.09-2.75%
  • USDSUSDS(USDS)$1.00-0.02%
  • cardanoCardano(ADA)$0.259716-5.51%
  • bitcoin-cashBitcoin Cash(BCH)$455.08-3.14%
  • HyperliquidHyperliquid(HYPE)$36.75-1.65%
  • leo-tokenLEO Token(LEO)$9.06-0.02%
  • moneroMonero(XMR)$361.200.45%
  • chainlinkChainlink(LINK)$8.93-3.89%
  • Ethena USDeEthena USDe(USDE)$1.00-0.03%
  • CantonCanton(CC)$0.1531195.17%
  • stellarStellar(XLM)$0.162415-1.71%
  • USD1USD1(USD1)$1.00-0.17%
  • daiDai(DAI)$1.00-0.05%
  • RainRain(RAIN)$0.008919-1.18%
  • litecoinLitecoin(LTC)$54.45-2.65%
  • avalanche-2Avalanche(AVAX)$9.55-4.40%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.01%
  • hedera-hashgraphHedera(HBAR)$0.092678-5.26%
  • suiSui(SUI)$0.98-6.26%
  • shiba-inuShiba Inu(SHIB)$0.000006-3.28%
  • zcashZcash(ZEC)$207.66-4.31%
  • the-open-networkToncoin(TON)$1.30-0.16%
  • crypto-com-chainCronos(CRO)$0.076758-0.71%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.103070-1.59%
  • tether-goldTether Gold(XAUT)$4,998.08-1.16%
  • MemeCoreMemeCore(M)$1.525.18%
  • pax-goldPAX Gold(PAXG)$5,030.36-1.15%
  • uniswapUniswap(UNI)$3.91-4.55%
  • polkadotPolkadot(DOT)$1.43-6.16%
  • mantleMantle(MNT)$0.71-2.05%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BittensorBittensor(TAO)$234.43-2.32%
  • Pi NetworkPi Network(PI)$0.207285-23.80%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • okbOKB(OKB)$93.46-1.98%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • SkySky(SKY)$0.076291-7.87%
  • Falcon USDFalcon USD(USDF)$1.00-0.05%
  • AsterAster(ASTER)$0.69-2.78%
  • nearNEAR Protocol(NEAR)$1.31-3.46%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Revolutionizing Code Generation: µCODE’s Single-Step Approach to Multi-Turn Feedback

March 10, 2025
in AI & Technology
Reading Time: 5 mins read
A A
Revolutionizing Code Generation: µCODE’s Single-Step Approach to Multi-Turn Feedback
ShareShareShareShareShare

Generating code with execution feedback is difficult because errors often require multiple corrections, and fixing them in a structured way is not simple. Training models to learn from execution feedback is necessary but approaches face challenges. Some methods attempt to correct errors in a single step but fail when multiple refinements are needed. Others use complex learning techniques to optimize long-term improvements. Still, these methods struggle with weak learning signals, making training slow and inefficient—the lack of an effective method for handling iterative corrections results in unstable learning and poor performance.

Currently, prompting-based systems try to solve multi-step tasks using self-debugging, test generation, and reflection but improve only slightly. Some methods train reward models like CodeRL for fixing errors and ARCHER for structured decision-making, while others use Monte Carlo Tree Search (MCTS) but require too much computation. Verifier-based approaches, like “Let’s Verify Step by Step” and AlphaCode, help find mistakes or create test cases, but some models rely only on syntax checks, which are not enough for proper training. Score limits training steps, and RISE uses complex corrections, making learning inefficient. Fine-tuned agents like FireAct, LEAP and feedback-based models like RL4VLM and GLAM try to improve performance. However, current techniques either fail to refine code properly over multiple steps or are too unstable and inefficient.

YOU MAY ALSO LIKE

OpenAI reportedly plans to add Sora video generation to ChatGPT

Meta is bringing more international news to its AI

To mitigate these issues, researchers proposed µCODE, a multi-turn code generation method that improves using execution feedback. Existing approaches face challenges with execution errors and reinforcement learning complexity, but µCODE overcomes these by following an expert iteration framework with a local search expert. A verifier assesses code quality, while a generator learns from the best solutions, refining its output over multiple iterations. During inference, a Best-of-N search strategy helps generate and improve code based on execution results, ensuring better performance.

The framework first trains a verifier through supervised learning to score code snippets, making evaluations more reliable. Binary Cross-Entropy predicts correctness, while Bradley-Terry ranks solutions for better selection. The generator then learns iteratively by relabeling past outputs with expert-selected solutions, improving accuracy. Multiple solutions are produced at inference, and the verifier selects the best, refining outputs until all tests pass. By treating code generation as an imitation learning problem, µCODE eliminates complex exploration and enables efficient optimization.

Researchers evaluated µCODE’s effectiveness by comparing it with state-of-the-art methods, analyzing the impact of the learned verifier during training and inference, and assessing different loss functions for verifier training. The generator was initialized using Llama models, and experiments were conducted on MBPP and HumanEval datasets. The training was performed on MBPP’s training set, with evaluations on its test set and HumanEval. Comparisons included single-turn and multi-turn baselines such as STaR and Multi–STaR, where fine-tuning was based on correctly generated solutions. Performance was measured using Best-of-N (BoN) accuracy, with the verifier ranking candidate solutions at each turn.

Results indicated that multi-turn approaches performed better than single-turn methods, highlighting the benefits of execution feedback. µCODE outperformed Multi-STaR, achieving a 1.9% improvement on HumanEval with a 1B model. Bon search further enhanced performance, with µCODE showing a 12.8% gain over greedy decoding. The learned verifier (LV) improved training outcomes, surpassing oracle verifiers (OV) alone. Further analysis showed that the learned verifier helped select better solutions during inference, particularly in the absence of public tests. Inference-time scaling revealed diminishing performance gains beyond a certain number of candidate solutions. A hierarchical verification strategy (PT+LV) integrating public test results with learned verifier scores provided the highest performance, showing the effectiveness of the verifier in eliminating erroneous solutions and making iterative predictions.

In conclusion, the proposed µCODE framework provides a scalable approach to multi-turn code generation using single-step rewards and a learned verifier for iterative improvement. Results indicate µCODE performs better than oracle-based approaches, producing more precise code. Though constrained by model size, dataset size, and Python focus, it can be a solid baseline for future work. Expanding training data, scaling to larger models, and applying it to multiple programming languages can further enhance its effectiveness.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

🚨 Meet Parlant: An LLM-first conversational AI framework designed to provide developers with the control and precision they need over their AI customer service agents, utilizing behavioral guidelines and runtime supervision. 🔧 🎛️ It’s operated using an easy-to-use CLI 📟 and native client SDKs in Python and TypeScript 📦.


Divyesh is a consulting intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of Technology, Kharagpur. He is a Data Science and Machine learning enthusiast who wants to integrate these leading technologies into the agricultural domain and solve challenges.

Parlant: Build Reliable AI Customer Facing Agents with LLMs 💬 ✅ (Promoted)

Credit: Source link

ShareTweetSendSharePin

Related Posts

OpenAI reportedly plans to add Sora video generation to ChatGPT
AI & Technology

OpenAI reportedly plans to add Sora video generation to ChatGPT

March 13, 2026
Meta is bringing more international news to its AI
AI & Technology

Meta is bringing more international news to its AI

March 13, 2026
Adobe agrees to pay settlement for making its subscriptions hard to cancel
AI & Technology

Adobe agrees to pay settlement for making its subscriptions hard to cancel

March 13, 2026
Nothing updates its AI app with semantic search and a new way to track events
AI & Technology

Nothing updates its AI app with semantic search and a new way to track events

March 13, 2026
Next Post
PlayStation plans to update the PS5 Pro with FSR 4-like technology in the near future

PlayStation plans to update the PS5 Pro with FSR 4-like technology in the near future

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Small businesses that successfully sued over Trump tariffs relaunch fight against new levies

Small businesses that successfully sued over Trump tariffs relaunch fight against new levies

March 9, 2026
Missing Florida man found alive in ‘quick sand’

Missing Florida man found alive in ‘quick sand’

March 10, 2026
Ro Khanna urges Trump to testify ‘voluntarily’ in Epstein probe after Bill Clinton did

Ro Khanna urges Trump to testify ‘voluntarily’ in Epstein probe after Bill Clinton did

March 8, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!