• bitcoinBitcoin(BTC)$63,410.000.96%
  • ethereumEthereum(ETH)$1,706.111.99%
  • tetherTether(USDT)$1.000.01%
  • binancecoinBNB(BNB)$605.720.31%
  • usd-coinUSDC(USDC)$1.00-0.02%
  • rippleXRP(XRP)$1.182.25%
  • solanaSolana(SOL)$67.192.00%
  • tronTRON(TRX)$0.3268740.25%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.032.19%
  • HyperliquidHyperliquid(HYPE)$63.518.02%
  • dogecoinDogecoin(DOGE)$0.0869621.48%
  • USDSUSDS(USDS)$1.00-0.01%
  • leo-tokenLEO Token(LEO)$9.41-2.01%
  • RainRain(RAIN)$0.0134030.29%
  • zcashZcash(ZEC)$458.964.84%
  • stellarStellar(XLM)$0.2059220.77%
  • cardanoCardano(ADA)$0.1724764.33%
  • CantonCanton(CC)$0.163562-0.73%
  • moneroMonero(XMR)$318.705.10%
  • chainlinkChainlink(LINK)$8.062.47%
  • whitebitWhiteBIT Coin(WBT)$45.381.06%
  • the-open-networkToncoin(TON)$1.731.65%
  • USD1USD1(USD1)$1.00-0.01%
  • Ethena USDeEthena USDe(USDE)$1.000.00%
  • bitcoin-cashBitcoin Cash(BCH)$210.45-7.92%
  • daiDai(DAI)$1.000.01%
  • MemeCoreMemeCore(M)$3.162.22%
  • LABLAB(LAB)$11.80-9.53%
  • hedera-hashgraphHedera(HBAR)$0.0818710.31%
  • litecoinLitecoin(LTC)$43.341.44%
  • suiSui(SUI)$0.761.09%
  • avalanche-2Avalanche(AVAX)$6.800.79%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • shiba-inuShiba Inu(SHIB)$0.0000050.73%
  • paypal-usdPayPal USD(PYUSD)$1.000.02%
  • crypto-com-chainCronos(CRO)$0.0622543.17%
  • nearNEAR Protocol(NEAR)$2.134.87%
  • tether-goldTether Gold(XAUT)$4,308.15-0.06%
  • Global DollarGlobal Dollar(USDG)$1.000.00%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.130.02%
  • BittensorBittensor(TAO)$217.162.24%
  • pax-goldPAX Gold(PAXG)$4,319.24-0.03%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0582262.43%
  • mantleMantle(MNT)$0.551.15%
  • OndoOndo(ONDO)$0.3673666.87%
  • worldcoin-wldWorldcoin(WLD)$0.4979466.34%
  • Ripple USDRipple USD(RLUSD)$1.000.00%
  • polkadotPolkadot(DOT)$0.980.94%
  • AsterAster(ASTER)$0.630.11%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

CMU Researchers Introduce Unlimiformer: An AI Method for Augmenting Pretrained Encoder-Decoders with an External Datastore to Allow for Unlimited Length Input

May 8, 2023
in AI & Technology
Reading Time: 4 mins read
A A
CMU Researchers Introduce Unlimiformer: An AI Method for Augmenting Pretrained Encoder-Decoders with an External Datastore to Allow for Unlimited Length Input
ShareShareShareShareShare

Transformer-based models have dominated the natural language processing (NLP) field since their introduction in 2017. Tokens for words, morphemes, punctuation, etc., are generated from the text input by the transformer. However, because transformers have to pay attention to every token in the input, their context windows need to be bigger to handle long-form jobs like book summaries, etc., where the number of tokens in the input might easily exceed a hundred thousand. To handle inputs of arbitrary length, a group of researchers from Carnegie Mellon University provides a broad strategy for enhancing model performance by supplementing pretrained encoder-decoder converters with an external datastore.

Unlimiformer is a new retrieval-based strategy that expands the input length tolerance of pretrained language models during testing. Any preexisting encoder-decoder transformer can be augmented with Unlimiformer to accept limitless inputs. Unlimiformer builds a datastore over the hidden states of all input tokens given a long input sequence. Next, the decoder uses its default cross attention to access the database and focus on the top k input tokens. The datastore supports sublinear searches and can be kept in GPU or CPU memory. A trained model can have its checkpoint enhanced by Unlimiformer without more training. Unlimiformer’s effectiveness can be further enhanced by tuning.

The maximum length of an input to a transformer is bounded by the size of the encoder’s context window. However, different information may be meaningful during decoding stages, and different attention centers may focus on multiple aspects of the data. As a result, a fixed context window may be inefficient since it focuses on tokens that an attention head needs to prioritize. At each decoding stage, Unlimiformer gives each head the option of selecting its unique context window from the entire input. To formalize this, we inject an Unlimiformer lookup into the decoder before applying cross-attention. This causes the model to conduct a k-nearest neighbor (kNN) search in an external datastore, selecting a set of tokens to focus on for each decoder layer and attention head.

🚀 JOIN the fastest ML Subreddit Community

To further boost Unlimiformer’s effectiveness, researchers are now focusing on training approaches. As a preliminary step, they consider alternative training methods that only demand less processing power than the conventional fine-tuning regime. They also investigate the computationally costly option of directly training the Unlimiformer.

The study’s code and models are available for download from GitHub.

Empirically, the team tested Unlimiformer on long-document and multi-document summarizing tasks, showing that it could summarize documents with as many as 350k tokens without truncating the inputs. Existing pretrained models were also fine-tuned using Unlimiformer, allowing them to handle unlimited inputs without needing any newly learned weights or alterations to the source code. Adding structure to the datastore or recovering embeddings in chunks, Unlimiformer may lead to further performance gains in retrieval-augmented big language models, which have shown encouraging results on downstream sequence-to-sequence generation tasks. Incorporating structure into the datastore or retrieving embeddings in chunks are two ways the researchers believe future work can boost speed. To further enhance the performance of retrieval-augmented LLMs on difficult downstream tasks, the information retrieval community has developed a wide array of approaches for improving retrieval. This is why the researchers behind the HuggingFace Transformers library have released a script that allows Unlimiformer to be injected into any model with a single click.


Check out the Paper and Github link. Don’t forget to join our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club


YOU MAY ALSO LIKE

iOS 27 Can Run On Phones As Old As The iPhone 11

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.


Credit: Source link

ShareTweetSendSharePin

Related Posts

iOS 27 Can Run On Phones As Old As The iPhone 11
AI & Technology

iOS 27 Can Run On Phones As Old As The iPhone 11

June 8, 2026
ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset
AI & Technology

ClawHub Security Signals: A Coding Guide to End-to-End Security Signal Analysis and Verdict Classification on the AI Skills Dataset

June 8, 2026
visionOS 27 Brings The New Siri To Apple’s Headset
AI & Technology

visionOS 27 Brings The New Siri To Apple’s Headset

June 8, 2026
Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs
AI & Technology

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

June 8, 2026
Next Post
Coinbase, Disney, Unmasking the Rally: Jim Cramer’s Stock Market Breakdown – May 14

Coinbase, Disney, Unmasking the Rally: Jim Cramer's Stock Market Breakdown - May 14

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Mentorship, loss, and leadership: Inside a movement uplifting young men of color

Mentorship, loss, and leadership: Inside a movement uplifting young men of color

June 6, 2026
🔴Live Day Trading – The First 60 Minutes Will Be Huge Today

🔴Live Day Trading – The First 60 Minutes Will Be Huge Today

June 2, 2026
Chinese pandas prepare to head to the U.S.

Chinese pandas prepare to head to the U.S.

June 6, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!