• bitcoinBitcoin(BTC)$59,538.00-0.64%
  • ethereumEthereum(ETH)$1,585.060.23%
  • tetherTether(USDT)$1.000.01%
  • binancecoinBNB(BNB)$553.31-0.10%
  • usd-coinUSDC(USDC)$1.000.01%
  • rippleXRP(XRP)$1.04-0.41%
  • solanaSolana(SOL)$74.112.12%
  • tronTRON(TRX)$0.319322-0.71%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.052.60%
  • HyperliquidHyperliquid(HYPE)$65.715.19%
  • dogecoinDogecoin(DOGE)$0.072324-1.03%
  • RainRain(RAIN)$0.0159152.27%
  • USDSUSDS(USDS)$1.000.00%
  • leo-tokenLEO Token(LEO)$9.531.18%
  • zcashZcash(ZEC)$398.654.30%
  • stellarStellar(XLM)$0.1819344.95%
  • moneroMonero(XMR)$312.810.59%
  • whitebitWhiteBIT Coin(WBT)$47.36-1.14%
  • CantonCanton(CC)$0.142390-3.97%
  • chainlinkChainlink(LINK)$7.28-0.47%
  • cardanoCardano(ADA)$0.144224-0.38%
  • LABLAB(LAB)$15.0310.02%
  • USD1USD1(USD1)$1.000.02%
  • daiDai(DAI)$1.000.00%
  • Ethena USDeEthena USDe(USDE)$1.000.02%
  • the-open-networkGram (prev. Toncoin)(GRAM)$1.600.13%
  • bitcoin-cashBitcoin Cash(BCH)$199.553.05%
  • litecoinLitecoin(LTC)$42.42-1.65%
  • Circle USYCCircle USYC(USYC)$1.130.05%
  • hedera-hashgraphHedera(HBAR)$0.070777-1.26%
  • Global DollarGlobal Dollar(USDG)$1.00-0.04%
  • avalanche-2Avalanche(AVAX)$6.620.40%
  • suiSui(SUI)$0.690.59%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.02%
  • shiba-inuShiba Inu(SHIB)$0.0000041.42%
  • crypto-com-chainCronos(CRO)$0.053627-0.97%
  • tether-goldTether Gold(XAUT)$3,962.39-2.32%
  • nearNEAR Protocol(NEAR)$1.85-0.55%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.14-0.40%
  • BittensorBittensor(TAO)$205.95-0.62%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.0592220.26%
  • pax-goldPAX Gold(PAXG)$3,961.94-2.41%
  • uniswapUniswap(UNI)$2.87-2.97%
  • AsterAster(ASTER)$0.62-0.95%
  • okbOKB(OKB)$79.361.19%
  • Ripple USDRipple USD(RLUSD)$1.00-0.01%
  • OndoOndo(ONDO)$0.3118250.12%
  • HTX DAOHTX DAO(HTX)$0.000002-0.65%
  • worldcoin-wldWorldcoin(WLD)$0.413513-5.10%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Stanford and Mila Researchers Propose Hyena: An Attention-Free Drop-in Replacement to the Core Building Block of Many Large-Scale Language Models

May 8, 2023
in AI & Technology
Reading Time: 6 mins read
A A
Stanford and Mila Researchers Propose Hyena: An Attention-Free Drop-in Replacement to the Core Building Block of Many Large-Scale Language Models
ShareShareShareShareShare

As we all know that the race to develop and come up with mindblowing Generative models such as ChatGPT and Bard, and their underlying technology such as GPT3 and GPT4, has taken the AI world by magnanimous force, there are still many challenges when it comes to the accessibility, training and actual feasibility of these models in lots of use cases which pertains to our day to day problems. 

If anyone has ever played around with any of such sequence models, there is one sure-shot problem that might have ruined their excitement. That is, the length of input they can send in to prompt the model. 

If they are enthusiasts who want to dabble in the core of such technologies and train their custom model, the whole optimization process makes it quite an impossible task. 

🚀 JOIN the fastest ML Subreddit Community

At the heart of these problems lies the quadratic nature of the optimization of attention models that sequence models utilize. One of the biggest reasons is the computation cost of such algorithms and the resources needed to solve this issue. It can be an extremely expensive solution, especially if someone wants to scale it up, which leads to only a few concentrated organizations having a vivid sense of understanding and real control of such algorithms. 

Simply put, attention exhibits quadratic cost in sequence length. Limiting the amount of context accessible and scaling it is a costly affair. 

However, worry not; there is new architecture called the Hyena, which is now making waves in the NLP community, and people ordain it as the rescuer we all need. It challenges the dominance of the existing attention mechanisms, and the research paper demonstrates its potential to topple the existing system. 

Developed by a team of researchers at a leading university, Hyena boasts an impressive performance on a range of subquadratic NLP tasks in terms of optimization. In this article, we will look closely at Hyena’s claims.

This paper suggests that subquadratic operators can match the quality of attention models at scale without being that costly in terms of parameters and optimization cost. Based on targeted reasoning tasks, the authors distill the three most important properties contributing to its performance. 

  1. Data control
  2. Sublinear parameter scaling
  3. Unrestricted context. 

Aiming with these points in mind, they then introduce the Hyena hierarchy. This new operator combines long convolutions and element-wise multiplicative gating to match the quality of attention at scale while reducing the computational cost. 

The experiments conducted reveal mindblowing results. 

  1. Language modeling. 

Hyena’s scaling was tested on autoregressive language modeling, which, when evaluated on perplexity on benchmark dataset WikiText103 and The Pile, revealed that Hyena is the first attention-free, convolution architecture to match GPT quality with a 20% reduction in total FLOPS.

Perplexity on WikiText103 (same tokenizer). ∗ are results from (Dao et al., 2022c). Deeper and thinner models (Hyena-slim) achieve lower perplexity

Perplexity on The Pile for models trained until a total number of tokens e.g., 5 billion (different runs for each token total). All models use the same tokenizer (GPT2). FLOP count is for the 15 billion token run

  1. Large Scale image classification 

The paper demonstrates the potential of Hyena as a general deep-learning operator for image classification. On image translation, they drop-in replace attention layers in the Vision Transformer(ViT) with the Hyena operator and match the performance with ViT.

On CIFAR-2D, we test a 2D version of Hyena long convolution filters in a standard convolutional architecture, which improves on the 2D long convolutional model S4ND (Nguyen et al., 2022) in accuracy with an 8% speedup and 25% fewer parameters.

The promising results at the sub-billion parameter scale suggest that attention may not be all we need and that simpler subquadratic designs such as Hyena, informed by simple guiding principles and evaluation on mechanistic interpretability benchmarks, form the basis for efficient large models.

With the waves this architecture is creating in the community, it will be interesting to see if the Hyena would have the last laugh.


Check out the Paper and Github link. Don’t forget to join our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club


YOU MAY ALSO LIKE

OpenClaw Releases iOS and Android Companion Node Apps That Connect a Phone to a Self-Hosted AI Agent Gateway

Sensitive iPhone Supplier Details Were Part Of Last Week’s Data Leak At Tata Electronics

Data Scientist currently working for S&P Global Market intelligence. Worked as data scientist for AI product startups. Reader and a learner at heart.


Credit: Source link

ShareTweetSendSharePin

Related Posts

OpenClaw Releases iOS and Android Companion Node Apps That Connect a Phone to a Self-Hosted AI Agent Gateway
AI & Technology

OpenClaw Releases iOS and Android Companion Node Apps That Connect a Phone to a Self-Hosted AI Agent Gateway

June 29, 2026
Sensitive iPhone Supplier Details Were Part Of Last Week’s Data Leak At Tata Electronics
AI & Technology

Sensitive iPhone Supplier Details Were Part Of Last Week’s Data Leak At Tata Electronics

June 29, 2026
There’s Now An OpenClaw App For iOS And Android Phones
AI & Technology

There’s Now An OpenClaw App For iOS And Android Phones

June 29, 2026
PyGraphistry Implementation Workflow for Interactive Graph Intelligence Pipelines in Security Analytics and Risk Investigation
AI & Technology

PyGraphistry Implementation Workflow for Interactive Graph Intelligence Pipelines in Security Analytics and Risk Investigation

June 29, 2026
Next Post
Bitcoin, Georgia, Tesla: Jim Cramer’s Stock Market Breakdown – Jan. 4

Bitcoin, Georgia, Tesla: Jim Cramer's Stock Market Breakdown - Jan. 4

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
UFC Baku results: Rafael Fiziev KOs Manuel Torres in spectacular fashion for emotional homecoming win – Yahoo Sports

UFC Baku results: Rafael Fiziev KOs Manuel Torres in spectacular fashion for emotional homecoming win – Yahoo Sports

June 27, 2026
Gladstone Commercial: Attractive Company, Unattractive Price For The Preferred Shares

Gladstone Commercial: Attractive Company, Unattractive Price For The Preferred Shares

June 25, 2026
AOD: Elevated Valuation And Flawed Portfolio Structure

AOD: Elevated Valuation And Flawed Portfolio Structure

June 24, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!