• Kinza Babylon Staked BTCKinza Babylon Staked BTC(KBTC)$83,270.000.00%
  • Steakhouse EURCV Morpho VaultSteakhouse EURCV Morpho Vault(STEAKEURCV)$0.000000-100.00%
  • Stride Staked InjectiveStride Staked Injective(STINJ)$16.51-4.18%
  • Vested XORVested XOR(VXOR)$3,404.231,000.00%
  • FibSwap DEXFibSwap DEX(FIBO)$0.0084659.90%
  • ICPanda DAOICPanda DAO(PANDA)$0.003106-39.39%
  • TruFin Staked APTTruFin Staked APT(TRUAPT)$8.020.00%
  • bitcoinBitcoin(BTC)$105,525.000.41%
  • ethereumEthereum(ETH)$2,514.810.86%
  • VNST StablecoinVNST Stablecoin(VNST)$0.0000400.67%
  • tetherTether(USDT)$1.00-0.02%
  • rippleXRP(XRP)$2.221.40%
  • binancecoinBNB(BNB)$650.180.27%
  • Wrapped SOLWrapped SOL(SOL)$143.66-2.32%
  • solanaSolana(SOL)$150.20-1.02%
  • usd-coinUSDC(USDC)$1.000.00%
  • dogecoinDogecoin(DOGE)$0.183991-0.30%
  • tronTRON(TRX)$0.2853202.67%
  • cardanoCardano(ADA)$0.66-0.56%
  • staked-etherLido Staked Ether(STETH)$2,514.101.00%
  • wrapped-bitcoinWrapped Bitcoin(WBTC)$105,574.000.62%
  • Gaj FinanceGaj Finance(GAJ)$0.0059271.46%
  • Content BitcoinContent Bitcoin(CTB)$24.482.55%
  • USD OneUSD One(USD1)$1.000.11%
  • HyperliquidHyperliquid(HYPE)$35.032.59%
  • SuiSui(SUI)$3.23-1.52%
  • Wrapped stETHWrapped stETH(WSTETH)$3,031.650.77%
  • UGOLD Inc.UGOLD Inc.(UGOLD)$3,042.460.08%
  • ParkcoinParkcoin(KPK)$1.101.76%
  • chainlinkChainlink(LINK)$13.79-1.44%
  • leo-tokenLEO Token(LEO)$9.282.31%
  • avalanche-2Avalanche(AVAX)$20.32-0.41%
  • stellarStellar(XLM)$0.2667710.50%
  • bitcoin-cashBitcoin Cash(BCH)$409.022.17%
  • ToncoinToncoin(TON)$3.16-0.69%
  • shiba-inuShiba Inu(SHIB)$0.000013-1.55%
  • USDSUSDS(USDS)$1.00-0.01%
  • hedera-hashgraphHedera(HBAR)$0.168064-0.03%
  • Yay StakeStone EtherYay StakeStone Ether(YAYSTONE)$2,671.07-2.84%
  • litecoinLitecoin(LTC)$87.91-0.16%
  • wethWETH(WETH)$2,515.171.08%
  • Wrapped eETHWrapped eETH(WEETH)$2,688.690.97%
  • moneroMonero(XMR)$335.613.23%
  • polkadotPolkadot(DOT)$4.050.81%
  • Pundi AIFXPundi AIFX(PUNDIAI)$16.000.00%
  • Binance Bridged USDT (BNB Smart Chain)Binance Bridged USDT (BNB Smart Chain)(BSC-USD)$1.00-0.02%
  • PengPeng(PENG)$0.60-13.59%
  • Ethena USDeEthena USDe(USDE)$1.000.05%
  • Bitget TokenBitget Token(BGB)$4.680.08%
  • MurasakiMurasaki(MURA)$4.32-12.46%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Do LLM Agents Have Regret? This Machine Learning Research from MIT and the University of Maryland Presents a Case Study on Online Learning and Games

March 28, 2024
in AI & Technology
Reading Time: 5 mins read
A A
Do LLM Agents Have Regret? This Machine Learning Research from MIT and the University of Maryland Presents a Case Study on Online Learning and Games
ShareShareShareShareShare

YOU MAY ALSO LIKE

AI Liability Insurance: The Next Step in Safeguarding Businesses from AI Failures

Mini Motorways is getting a creative mode

Large Language Models (LLMs) have been increasingly employed for (interactive) decision-making through the model development of LLM-based agents. LLMs have shown remarkable successes in embodied AI, natural science, and social science applications in recent years. LLMs have also exhibited remarkable potential in solving various games. These exciting empirical successes require rigorous examination and understanding through a theoretical lens of decision-making. However, the performance of LLM agents in decision-making has yet to be fully investigated through quantitative metrics, especially in the multi-agent setting when they interact with each other, a typical scenario in real-world LLM-agent applications.

Thus, it is natural to ask: Is it possible to examine and better understand LLMs’ online and strategic decision-making behaviors through the lens of regret?

The impressive capability of LLMs for reasoning has inspired an enhancing line of research on how LLM-based autonomous agents interact with the environment by taking actions repeatedly/sequentially based on the feedback they receive. Some significant promises have been shown from a planning perspective. In particular, for embodied AI applications, e.g., robotics, LLMs have achieved impressive performance when used as the controller for decision-making. However, the performance of decision-making has yet to be rigorously characterized via the regret metric in these works. Recently, some researchers have proposed a principled architecture for LLM-agent, with provable regret guarantees in stationary and stochastic decision-making environments, under the Bayesian adaptive Markov decision processes framework.

To better understand the limits of LLM agents in these interactive environments, researchers from MIT and the University of Maryland propose to study their interactions in benchmark decision-making settings in online learning and game theory through the performance metric of regret. They propose a unique unsupervised training loss of regret-loss, which, in contrast to the supervised pre-training loss, does not require the labels of (optimal) actions. Then, they established the statistical guarantee of generalization bound for regret-loss minimization, followed by the optimization guarantee that minimizing such a loss may automatically lead to known no-regret learning algorithms.

Researchers propose two frameworks to rigorously validate the no-regret behavior of algorithms over a finite T, which might be of independent interest: a Trend-checking framework and a Regression-based framework. In the Trend-checking framework, they defined  H0 and H1, which denote the null and alternative hypotheses, respectively. The notion of convergence is related to T → ∞ by definition, making it challenging to verify directly. As an alternative, they propose a more tractable hypothesis. They propose an alternative approach in a Regression-based framework by fitting the data with regression. In particular, one can use the data to fit a linear function.

In the experiments, They compare GPT-4 with well-known no-regret algorithms, FTRL with entropy regularization, and FTPL with Gaussian perturbations (with tuned parameters). These pre-trained LLMs can achieve no regret and often have smaller regrets than these baselines. While comparing the performance of pre-trained LLMs with that of the counterparts of FTRL with bandit feedback, e.g., EXP3 and the bandit-version of FTPL, where GPT-4 consistently achieves lower regret. Regret of GPT-3.5 Turbo/GPT-4 for repeated games of 3 different game sizes, where both statistical frameworks validate the sublinear regret. 

In conclusion, the researchers from MIT and the University of Maryland studied the online decision-making and strategic behaviors of LLMs quantitatively through the metric of regret. They examined and validated the no-regret behavior of several representative pre-trained LLMs in benchmark online learning and game settings. They then provide theoretical insights into the no-regret behavior by connecting pre-trained LLMs to the follow-the-perturbed-leader algorithm in online learning under certain assumptions. They also identified (simple) cases where pre-trained LLMs fail to be no-regret. They thus proposed a new unsupervised training loss, regret-loss, to provably promote the no-regret behavior of Transformers without the labels of (optimal) actions.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit


Asjad is an intern consultant at Marktechpost. He is persuing B.Tech in mechanical engineering at the Indian Institute of Technology, Kharagpur. Asjad is a Machine learning and deep learning enthusiast who is always researching the applications of machine learning in healthcare.


🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…


Credit: Source link

ShareTweetSendSharePin

Related Posts

AI Liability Insurance: The Next Step in Safeguarding Businesses from AI Failures
AI & Technology

AI Liability Insurance: The Next Step in Safeguarding Businesses from AI Failures

June 8, 2025
Mini Motorways is getting a creative mode
AI & Technology

Mini Motorways is getting a creative mode

June 7, 2025
Agent-based computing is outgrowing the web as we know it
AI & Technology

Agent-based computing is outgrowing the web as we know it

June 7, 2025
New Tales and Emeteria unveil Fading Echo action-adventure game
AI & Technology

New Tales and Emeteria unveil Fading Echo action-adventure game

June 7, 2025
Next Post
(Warning) SEC Lawsuit Against Coinbase…

(Warning) SEC Lawsuit Against Coinbase...

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Compyl Raises $12M Series A to Redefine AI-Guided GRC and Risk Management

Compyl Raises $12M Series A to Redefine AI-Guided GRC and Risk Management

June 4, 2025
Nebraska becomes latest state to ban transgender students from girls sports

Nebraska becomes latest state to ban transgender students from girls sports

June 6, 2025
My Mother-in-Law Is Threatening To Sue Us

My Mother-in-Law Is Threatening To Sue Us

June 7, 2025

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!