• bitcoinBitcoin(BTC)$78,081.000.44%
  • ethereumEthereum(ETH)$2,347.561.27%
  • tetherTether(USDT)$1.000.00%
  • rippleXRP(XRP)$1.43-0.08%
  • binancecoinBNB(BNB)$632.450.03%
  • usd-coinUSDC(USDC)$1.000.01%
  • solanaSolana(SOL)$86.41-0.20%
  • tronTRON(TRX)$0.3236180.15%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.02-0.62%
  • dogecoinDogecoin(DOGE)$0.0988060.50%
  • whitebitWhiteBIT Coin(WBT)$55.290.64%
  • USDSUSDS(USDS)$1.00-0.01%
  • HyperliquidHyperliquid(HYPE)$41.15-0.80%
  • leo-tokenLEO Token(LEO)$10.290.32%
  • cardanoCardano(ADA)$0.2524750.36%
  • bitcoin-cashBitcoin Cash(BCH)$452.07-0.38%
  • moneroMonero(XMR)$387.214.70%
  • chainlinkChainlink(LINK)$9.470.73%
  • zcashZcash(ZEC)$354.62-1.24%
  • CantonCanton(CC)$0.150889-0.65%
  • stellarStellar(XLM)$0.171619-0.28%
  • MemeCoreMemeCore(M)$4.390.80%
  • daiDai(DAI)$1.000.01%
  • USD1USD1(USD1)$1.000.05%
  • litecoinLitecoin(LTC)$56.02-0.75%
  • avalanche-2Avalanche(AVAX)$9.460.50%
  • hedera-hashgraphHedera(HBAR)$0.0928631.70%
  • Ethena USDeEthena USDe(USDE)$1.000.00%
  • suiSui(SUI)$0.950.06%
  • shiba-inuShiba Inu(SHIB)$0.0000060.20%
  • RainRain(RAIN)$0.007471-0.72%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • the-open-networkToncoin(TON)$1.31-1.98%
  • crypto-com-chainCronos(CRO)$0.0703490.12%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • tether-goldTether Gold(XAUT)$4,703.940.15%
  • Global DollarGlobal Dollar(USDG)$1.000.00%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.075016-0.52%
  • BittensorBittensor(TAO)$247.31-1.41%
  • pax-goldPAX Gold(PAXG)$4,703.700.07%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • mantleMantle(MNT)$0.66-1.09%
  • polkadotPolkadot(DOT)$1.260.06%
  • uniswapUniswap(UNI)$3.280.74%
  • SkySky(SKY)$0.0880186.59%
  • Pi NetworkPi Network(PI)$0.1808806.30%
  • nearNEAR Protocol(NEAR)$1.39-0.84%
  • Falcon USDFalcon USD(USDF)$1.00-0.02%
  • okbOKB(OKB)$84.410.06%
  • pepePepe(PEPE)$0.0000041.65%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Stanford and Mila Researchers Propose Hyena: An Attention-Free Drop-in Replacement to the Core Building Block of Many Large-Scale Language Models

May 8, 2023
in AI & Technology
Reading Time: 6 mins read
A A
Stanford and Mila Researchers Propose Hyena: An Attention-Free Drop-in Replacement to the Core Building Block of Many Large-Scale Language Models
ShareShareShareShareShare

As we all know that the race to develop and come up with mindblowing Generative models such as ChatGPT and Bard, and their underlying technology such as GPT3 and GPT4, has taken the AI world by magnanimous force, there are still many challenges when it comes to the accessibility, training and actual feasibility of these models in lots of use cases which pertains to our day to day problems. 

If anyone has ever played around with any of such sequence models, there is one sure-shot problem that might have ruined their excitement. That is, the length of input they can send in to prompt the model. 

If they are enthusiasts who want to dabble in the core of such technologies and train their custom model, the whole optimization process makes it quite an impossible task. 

🚀 JOIN the fastest ML Subreddit Community

At the heart of these problems lies the quadratic nature of the optimization of attention models that sequence models utilize. One of the biggest reasons is the computation cost of such algorithms and the resources needed to solve this issue. It can be an extremely expensive solution, especially if someone wants to scale it up, which leads to only a few concentrated organizations having a vivid sense of understanding and real control of such algorithms. 

Simply put, attention exhibits quadratic cost in sequence length. Limiting the amount of context accessible and scaling it is a costly affair. 

However, worry not; there is new architecture called the Hyena, which is now making waves in the NLP community, and people ordain it as the rescuer we all need. It challenges the dominance of the existing attention mechanisms, and the research paper demonstrates its potential to topple the existing system. 

Developed by a team of researchers at a leading university, Hyena boasts an impressive performance on a range of subquadratic NLP tasks in terms of optimization. In this article, we will look closely at Hyena’s claims.

This paper suggests that subquadratic operators can match the quality of attention models at scale without being that costly in terms of parameters and optimization cost. Based on targeted reasoning tasks, the authors distill the three most important properties contributing to its performance. 

  1. Data control
  2. Sublinear parameter scaling
  3. Unrestricted context. 

Aiming with these points in mind, they then introduce the Hyena hierarchy. This new operator combines long convolutions and element-wise multiplicative gating to match the quality of attention at scale while reducing the computational cost. 

The experiments conducted reveal mindblowing results. 

  1. Language modeling. 

Hyena’s scaling was tested on autoregressive language modeling, which, when evaluated on perplexity on benchmark dataset WikiText103 and The Pile, revealed that Hyena is the first attention-free, convolution architecture to match GPT quality with a 20% reduction in total FLOPS.

Perplexity on WikiText103 (same tokenizer). ∗ are results from (Dao et al., 2022c). Deeper and thinner models (Hyena-slim) achieve lower perplexity

Perplexity on The Pile for models trained until a total number of tokens e.g., 5 billion (different runs for each token total). All models use the same tokenizer (GPT2). FLOP count is for the 15 billion token run

  1. Large Scale image classification 

The paper demonstrates the potential of Hyena as a general deep-learning operator for image classification. On image translation, they drop-in replace attention layers in the Vision Transformer(ViT) with the Hyena operator and match the performance with ViT.

On CIFAR-2D, we test a 2D version of Hyena long convolution filters in a standard convolutional architecture, which improves on the 2D long convolutional model S4ND (Nguyen et al., 2022) in accuracy with an 8% speedup and 25% fewer parameters.

The promising results at the sub-billion parameter scale suggest that attention may not be all we need and that simpler subquadratic designs such as Hyena, informed by simple guiding principles and evaluation on mechanistic interpretability benchmarks, form the basis for efficient large models.

With the waves this architecture is creating in the community, it will be interesting to see if the Hyena would have the last laugh.


Check out the Paper and Github link. Don’t forget to join our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club


YOU MAY ALSO LIKE

Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models

RAG Without Vectors: How PageIndex Retrieves by Reasoning

Data Scientist currently working for S&P Global Market intelligence. Worked as data scientist for AI product startups. Reader and a learner at heart.


Credit: Source link

ShareTweetSendSharePin

Related Posts

Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models
AI & Technology

Top 7 Benchmarks That Actually Matter for Agentic Reasoning in Large Language Models

April 26, 2026
RAG Without Vectors: How PageIndex Retrieves by Reasoning
AI & Technology

RAG Without Vectors: How PageIndex Retrieves by Reasoning

April 26, 2026
A Coding Tutorial on Datashader on Rendering Massive Datasets with High-Performance Python Visual Analytics
AI & Technology

A Coding Tutorial on Datashader on Rendering Massive Datasets with High-Performance Python Visual Analytics

April 26, 2026
BYD’s next all-electric hypercar is a convertible that’s coming to Europe first
AI & Technology

BYD’s next all-electric hypercar is a convertible that’s coming to Europe first

April 25, 2026
Next Post
Bitcoin, Georgia, Tesla: Jim Cramer’s Stock Market Breakdown – Jan. 4

Bitcoin, Georgia, Tesla: Jim Cramer's Stock Market Breakdown - Jan. 4

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Epstein’s other house of ‘horrors’: Zorro Ranch

Epstein’s other house of ‘horrors’: Zorro Ranch

April 21, 2026
Farmers facing uncertainty as war with Iran causes rising costs and volatility

Farmers facing uncertainty as war with Iran causes rising costs and volatility

April 25, 2026
Pilots appear to ‘meow’ and ‘bark’ at each other over radio at DCA

Pilots appear to ‘meow’ and ‘bark’ at each other over radio at DCA

April 23, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!