• bitcoinBitcoin(BTC)$61,215.000.50%
  • ethereumEthereum(ETH)$1,584.140.52%
  • tetherTether(USDT)$1.00-0.02%
  • binancecoinBNB(BNB)$577.610.24%
  • usd-coinUSDC(USDC)$1.000.00%
  • rippleXRP(XRP)$1.111.05%
  • solanaSolana(SOL)$63.19-0.69%
  • tronTRON(TRX)$0.3238251.18%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.46%
  • HyperliquidHyperliquid(HYPE)$57.85-2.95%
  • dogecoinDogecoin(DOGE)$0.0829781.85%
  • USDSUSDS(USDS)$1.000.00%
  • leo-tokenLEO Token(LEO)$9.39-2.31%
  • RainRain(RAIN)$0.0130630.58%
  • stellarStellar(XLM)$0.2102986.68%
  • CantonCanton(CC)$0.16489410.15%
  • zcashZcash(ZEC)$377.270.11%
  • cardanoCardano(ADA)$0.1601012.02%
  • moneroMonero(XMR)$294.49-4.07%
  • chainlinkChainlink(LINK)$7.532.42%
  • whitebitWhiteBIT Coin(WBT)$43.640.37%
  • the-open-networkToncoin(TON)$1.7313.99%
  • USD1USD1(USD1)$1.000.02%
  • LABLAB(LAB)$14.7259.22%
  • Ethena USDeEthena USDe(USDE)$1.00-0.02%
  • bitcoin-cashBitcoin Cash(BCH)$217.972.18%
  • daiDai(DAI)$1.000.00%
  • MemeCoreMemeCore(M)$3.1712.11%
  • hedera-hashgraphHedera(HBAR)$0.0809511.17%
  • litecoinLitecoin(LTC)$41.69-3.79%
  • suiSui(SUI)$0.756.58%
  • avalanche-2Avalanche(AVAX)$6.730.44%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.01%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • shiba-inuShiba Inu(SHIB)$0.0000051.57%
  • tether-goldTether Gold(XAUT)$4,296.37-0.09%
  • crypto-com-chainCronos(CRO)$0.0585511.81%
  • Global DollarGlobal Dollar(USDG)$1.00-0.03%
  • nearNEAR Protocol(NEAR)$1.93-2.97%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.13-0.28%
  • pax-goldPAX Gold(PAXG)$4,304.49-0.28%
  • BittensorBittensor(TAO)$200.092.98%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0562661.08%
  • mantleMantle(MNT)$0.52-0.38%
  • Ripple USDRipple USD(RLUSD)$1.000.00%
  • OndoOndo(ONDO)$0.332949-0.35%
  • AsterAster(ASTER)$0.631.23%
  • polkadotPolkadot(DOT)$0.950.83%
  • HTX DAOHTX DAO(HTX)$0.0000021.00%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Dream First, Learn Later: DECKARD is an AI Approach That Uses LLMs for Training Reinforcement learning (RL) Agents

May 4, 2023
in AI & Technology
Reading Time: 4 mins read
A A
Dream First, Learn Later: DECKARD is an AI Approach That Uses LLMs for Training Reinforcement learning (RL) Agents
ShareShareShareShareShare

Reinforcement learning (RL) is a popular approach to training autonomous agents that can learn to perform complex tasks by interacting with their environment. RL enables them to learn the best action in different conditions and adapt to their environment using a reward system.

A major challenge in RL is how to explore the vast state space of many real-world problems efficiently. This challenge arises due to the fact that in RL, agents learn by interacting with their environment via exploration. Think of an agent that tries to play Minecraft. If you heard about it before, you know how complicated Minecraft crafting tree looks. You have hundreds of craftable objects, and you might need to craft one to craft another, etc. So, it is a really complex environment.

As the environment can have a large number of possible states and actions, it can become difficult for the agent to find the optimal policy through random exploration alone. The agent must balance between exploiting the current best policy and exploring new parts of the state space to find a better policy potentially. Finding efficient exploration methods that can balance exploration and exploitation is an active area of research in RL.

🚀 JOIN the fastest ML Subreddit Community

It’s known that practical decision-making systems need to use prior knowledge about a task efficiently. By having prior information about the task itself, the agent can better adapt its policy and can avoid getting stuck in sub-optimal policies. However, most reinforcement learning methods currently train without any previous training or external knowledge. 

But why is that the case? In recent years, there has been growing interest in using large language models (LLMs) to aid RL agents in exploration by providing external knowledge. This approach has shown promise, but there are still many challenges to overcome, such as grounding the LLM knowledge in the environment and dealing with the accuracy of LLM outputs.

So, should we give up on using LLMs to aid RL agents? If not, how can we fix those problems and then use them again to guide RL agents? The answer has a name, and it’s DECKARD.

DECKARD is trained for Minecraft, as crafting a specific item in Minecraft can be a challenging task if one lacks expert knowledge of the game. This has been demonstrated by studies that have shown that achieving a goal in Minecraft can be made easier through the use of dense rewards or expert demonstrations. As a result, item crafting in Minecraft has become a persistent challenge in the field of AI.

DECKARD utilizes a few-shot prompting technique on a large language model (LLM) to generate an Abstract World Model (AWM) for subgoals. It uses the LLM to hypothesize an AWM, which means it dreams about the task and the steps to solve it. Then, it wakes up and learns a modular policy of subgoals that it generates during dreaming. Since this is done in the real environment, DECKARD can verify the hypothesized AWM. The AWM is corrected during the waking phase, and discovered nodes are marked as verified to be used again in the future.

Experiments show us that LLM guidance is essential to exploration in DECKARD, with a version of the agent without LLM guidance taking over twice as long to craft most items during open-ended exploration. When exploring a specific task, DECKARD improves sample efficiency by orders of magnitude compared to comparable agents, demonstrating the potential for robustly applying LLMs to RL.


Check out the Research Paper, Code, and Project. Don’t forget to join our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club


YOU MAY ALSO LIKE

SpaceX To Target $75B in IPO at $135 Per Share | Bloomberg Tech 6/03/2026

SpaceX Rejects Another Wall Street Convention Ahead of IPO

Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networking.


Credit: Source link

ShareTweetSendSharePin

Related Posts

SpaceX To Target B in IPO at 5 Per Share | Bloomberg Tech 6/03/2026
AI & Technology

SpaceX To Target $75B in IPO at $135 Per Share | Bloomberg Tech 6/03/2026

June 7, 2026
SpaceX Rejects Another Wall Street Convention Ahead of IPO
AI & Technology

SpaceX Rejects Another Wall Street Convention Ahead of IPO

June 7, 2026
AI Financing Is an Arms Race, Says GoldenTree’s Tananbaum
AI & Technology

AI Financing Is an Arms Race, Says GoldenTree’s Tananbaum

June 7, 2026
Elon Musk to Retain 84% Voting Control After SpaceX IPO
AI & Technology

Elon Musk to Retain 84% Voting Control After SpaceX IPO

June 7, 2026
Next Post
How to Avoid Investing in Companies that go Bust

How to Avoid Investing in Companies that go Bust

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
S&P 500 price target by year-end? Tiffany McGhee goes rapid-fire in a game of “This or That.”

S&P 500 price target by year-end? Tiffany McGhee goes rapid-fire in a game of “This or That.”

June 4, 2026
Regency Centers Corporation (REG) Presents at Nareit REITweek: 2026 Investor Conference Transcript

Regency Centers Corporation (REG) Presents at Nareit REITweek: 2026 Investor Conference Transcript

June 3, 2026
A first look at the new Trump Mobile phone

A first look at the new Trump Mobile phone

June 3, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!