• Kinza Babylon Staked BTCKinza Babylon Staked BTC(KBTC)$83,270.000.00%
  • Steakhouse EURCV Morpho VaultSteakhouse EURCV Morpho Vault(STEAKEURCV)$0.000000-100.00%
  • Stride Staked InjectiveStride Staked Injective(STINJ)$16.51-4.18%
  • Vested XORVested XOR(VXOR)$3,404.231,000.00%
  • FibSwap DEXFibSwap DEX(FIBO)$0.0084659.90%
  • ICPanda DAOICPanda DAO(PANDA)$0.003106-39.39%
  • TruFin Staked APTTruFin Staked APT(TRUAPT)$8.020.00%
  • bitcoinBitcoin(BTC)$104,894.00-0.49%
  • ethereumEthereum(ETH)$2,467.35-3.92%
  • VNST StablecoinVNST Stablecoin(VNST)$0.0000400.67%
  • tetherTether(USDT)$1.000.00%
  • rippleXRP(XRP)$2.35-3.82%
  • binancecoinBNB(BNB)$645.00-1.12%
  • solanaSolana(SOL)$163.81-6.75%
  • Wrapped SOLWrapped SOL(SOL)$143.66-2.32%
  • usd-coinUSDC(USDC)$1.000.00%
  • dogecoinDogecoin(DOGE)$0.220798-5.93%
  • cardanoCardano(ADA)$0.73-4.57%
  • tronTRON(TRX)$0.264665-2.90%
  • staked-etherLido Staked Ether(STETH)$2,464.31-3.93%
  • wrapped-bitcoinWrapped Bitcoin(WBTC)$104,776.00-0.54%
  • SuiSui(SUI)$3.76-5.04%
  • Gaj FinanceGaj Finance(GAJ)$0.0059271.46%
  • Content BitcoinContent Bitcoin(CTB)$24.482.55%
  • USD OneUSD One(USD1)$1.000.11%
  • Wrapped stETHWrapped stETH(WSTETH)$2,965.70-3.76%
  • chainlinkChainlink(LINK)$15.48-4.75%
  • UGOLD Inc.UGOLD Inc.(UGOLD)$3,042.460.08%
  • ParkcoinParkcoin(KPK)$1.101.76%
  • avalanche-2Avalanche(AVAX)$22.06-6.41%
  • stellarStellar(XLM)$0.282629-4.48%
  • HyperliquidHyperliquid(HYPE)$25.85-3.07%
  • shiba-inuShiba Inu(SHIB)$0.000014-5.83%
  • hedera-hashgraphHedera(HBAR)$0.190644-4.25%
  • leo-tokenLEO Token(LEO)$8.62-0.70%
  • bitcoin-cashBitcoin Cash(BCH)$385.81-5.09%
  • litecoinLitecoin(LTC)$97.91-3.83%
  • ToncoinToncoin(TON)$2.97-6.72%
  • USDSUSDS(USDS)$1.000.01%
  • polkadotPolkadot(DOT)$4.55-6.73%
  • Yay StakeStone EtherYay StakeStone Ether(YAYSTONE)$2,671.07-2.84%
  • wethWETH(WETH)$2,469.67-3.80%
  • moneroMonero(XMR)$343.080.19%
  • Pundi AIFXPundi AIFX(PUNDIAI)$16.000.00%
  • Bitget TokenBitget Token(BGB)$5.15-0.88%
  • Binance Bridged USDT (BNB Smart Chain)Binance Bridged USDT (BNB Smart Chain)(BSC-USD)$1.000.04%
  • PengPeng(PENG)$0.60-13.59%
  • Wrapped eETHWrapped eETH(WEETH)$2,631.43-3.80%
  • PepePepe(PEPE)$0.000013-6.63%
  • Pi NetworkPi Network(PI)$0.72-4.03%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Reinforcement Learning Makes LLMs Search-Savvy: Ant Group Researchers Introduce SEM to Optimize Tool Usage and Reasoning Efficiency

May 19, 2025
in AI & Technology
Reading Time: 3 mins read
A A
Reinforcement Learning Makes LLMs Search-Savvy: Ant Group Researchers Introduce SEM to Optimize Tool Usage and Reasoning Efficiency
ShareShareShareShareShare

Recent progress in LLMs has shown their potential in performing complex reasoning tasks and effectively using external tools like search engines. Despite this, teaching models to make smart decisions about when to rely on internal knowledge versus search remains a key challenge. While simple prompt-based methods can guide models to invoke tools, LLMs still struggle with more nuanced behaviors, such as recognizing when an initial search was incorrect and deciding to search again. RL has been explored to improve these behaviors by rewarding effective search usage. However, RL often leads to unnecessary tool use, with models executing redundant searches even for simple tasks, highlighting inefficiencies that must be addressed.

Various RL strategies, including Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO), have been used to align LLM behavior with human expectations. PPO helps balance learning exploration with maintaining policy stability, while DPO simplifies alignment by directly optimizing model responses based on user preferences. GRPO introduces group-based evaluations to capture subtle improvements in reasoning better. Meanwhile, treating LLMs as autonomous agents that plan and execute multi-step reasoning tasks is gaining traction. Frameworks like AutoGPT and LangChain showcase how these agents can refine their outputs through iterative reasoning and search. Yet, current agent systems often depend on fixed prompts or heuristic-based tool use, limiting their adaptability and efficiency. 

YOU MAY ALSO LIKE

NVIDIA and Foxconn are building an ’AI factory supercomputer’ in Taiwan

AI’s Struggle to Read Analogue Clocks May Have Deeper Significance

Researchers at Ant Group introduce SEM, a post-training reinforcement learning framework designed to teach LLMs when to use search tools and when to rely on internal knowledge. By training on a balanced dataset combining questions that do and do not require external retrieval, SEM guides the model to issue search requests only when necessary. Using a structured reasoning format and GRPO, the framework rewards accurate answers without search and penalizes unnecessary tool use. Results show that SEM improves response accuracy and efficiency, helping models better judge when external information is needed, thus enhancing reasoning in complex scenarios. 

To integrate search tools into a model’s reasoning process, SEM uses reinforcement learning to teach models when and how to use search effectively. The training data combines Musique (questions needing external info) and MMLU (questions answerable from prior knowledge), helping models learn to judge when search is necessary. Using the GRPO framework, the model is rewarded for accurate, efficient answers, discouraging unnecessary searches, and encouraging them when internal knowledge falls short. A structured response format (<think>, <answer>, <search>, <result>) standardizes training and allows for precise reward assignment, improving both reasoning quality and search decision-making. 

The study evaluates a model trained to determine when to rely on its internal knowledge and when to use external search. It combines Musique (unfamiliar questions) and MMLU (familiar questions) for training and evaluates performance on datasets like HotpotQA, GSM8K, and MMLU. The proposed SEM method outperforms baselines like Naive RAG and ReSearch in answer accuracy and search efficiency. SEM reduces unnecessary searches on known questions while improving reasoning on unknown ones. Case studies and training curves confirm SEM’s stable learning and intelligent decision-making. Overall, SEM enhances retrieval decisions and internal reasoning in large language models. 

In conclusion, SEM is a post-training reinforcement learning framework designed to improve how large language models use external search tools. The model is trained on a dataset combining MuSiQue and MMLU, helping it distinguish between questions it can answer internally and those that require external retrieval. SEM uses a structured reasoning approach and a reward function that penalizes unnecessary searches while promoting accurate and efficient retrieval. Experiments on benchmarks like HotpotQA, GSM8K, and MMLU show that SEM reduces redundant searches and improves accuracy. This approach enhances reasoning efficiency and intelligent use of external knowledge in LLMs. 


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit.

The post Reinforcement Learning Makes LLMs Search-Savvy: Ant Group Researchers Introduce SEM to Optimize Tool Usage and Reasoning Efficiency appeared first on MarkTechPost.

Credit: Source link

ShareTweetSendSharePin

Related Posts

NVIDIA and Foxconn are building an ’AI factory supercomputer’ in Taiwan
AI & Technology

NVIDIA and Foxconn are building an ’AI factory supercomputer’ in Taiwan

May 19, 2025
AI’s Struggle to Read Analogue Clocks May Have Deeper Significance
AI & Technology

AI’s Struggle to Read Analogue Clocks May Have Deeper Significance

May 19, 2025
Salesforce just unveiled AI ‘digital teammates’ in Slack — and they’re coming for Microsoft Copilot
AI & Technology

Salesforce just unveiled AI ‘digital teammates’ in Slack — and they’re coming for Microsoft Copilot

May 19, 2025
Foxconn builds AI factory in partnership with Taiwan and Nvidia
AI & Technology

Foxconn builds AI factory in partnership with Taiwan and Nvidia

May 19, 2025
Next Post
Louisiana's John Foster finishes second on 'American Idol.' Jamal Roberts of Mississippi wins. – The Advocate

Louisiana's John Foster finishes second on 'American Idol.' Jamal Roberts of Mississippi wins. - The Advocate

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Why Microsoft announced more layoffs

Why Microsoft announced more layoffs

May 18, 2025
Polish presidential vote tests whether PM's European vision is Trump-proof – Reuters

Polish presidential vote tests whether PM's European vision is Trump-proof – Reuters

May 18, 2025
Romania election: Pro-EU centrist Nicusor Dan wins runoff – DW

Romania election: Pro-EU centrist Nicusor Dan wins runoff – DW

May 18, 2025

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!