• bitcoinBitcoin(BTC)$60,812.00-1.77%
  • ethereumEthereum(ETH)$1,559.68-5.75%
  • tetherTether(USDT)$1.000.03%
  • binancecoinBNB(BNB)$572.95-2.86%
  • usd-coinUSDC(USDC)$1.000.00%
  • rippleXRP(XRP)$1.08-2.83%
  • solanaSolana(SOL)$62.27-5.13%
  • tronTRON(TRX)$0.319089-1.90%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.95%
  • HyperliquidHyperliquid(HYPE)$59.19-4.04%
  • dogecoinDogecoin(DOGE)$0.081156-2.59%
  • USDSUSDS(USDS)$1.000.00%
  • leo-tokenLEO Token(LEO)$9.57-3.09%
  • RainRain(RAIN)$0.012910-1.69%
  • stellarStellar(XLM)$0.1991365.77%
  • zcashZcash(ZEC)$360.4818.53%
  • CantonCanton(CC)$0.1520935.24%
  • cardanoCardano(ADA)$0.155479-3.77%
  • moneroMonero(XMR)$297.83-9.88%
  • chainlinkChainlink(LINK)$7.30-2.74%
  • whitebitWhiteBIT Coin(WBT)$43.16-2.76%
  • USD1USD1(USD1)$1.000.05%
  • Ethena USDeEthena USDe(USDE)$1.000.02%
  • bitcoin-cashBitcoin Cash(BCH)$216.07-2.15%
  • daiDai(DAI)$1.000.03%
  • the-open-networkToncoin(TON)$1.541.39%
  • MemeCoreMemeCore(M)$2.82-7.89%
  • hedera-hashgraphHedera(HBAR)$0.078584-2.72%
  • litecoinLitecoin(LTC)$42.66-2.53%
  • avalanche-2Avalanche(AVAX)$6.69-5.27%
  • paypal-usdPayPal USD(PYUSD)$1.00-0.03%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • suiSui(SUI)$0.70-1.09%
  • LABLAB(LAB)$8.90-18.98%
  • shiba-inuShiba Inu(SHIB)$0.000005-3.04%
  • tether-goldTether Gold(XAUT)$4,285.42-3.03%
  • crypto-com-chainCronos(CRO)$0.057540-1.61%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • nearNEAR Protocol(NEAR)$1.90-6.00%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.130.11%
  • pax-goldPAX Gold(PAXG)$4,293.06-3.41%
  • BittensorBittensor(TAO)$194.32-1.20%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.055425-1.54%
  • mantleMantle(MNT)$0.52-2.54%
  • Ripple USDRipple USD(RLUSD)$1.000.02%
  • polkadotPolkadot(DOT)$0.94-4.03%
  • AsterAster(ASTER)$0.62-6.45%
  • OndoOndo(ONDO)$0.323814-7.74%
  • HTX DAOHTX DAO(HTX)$0.000002-2.17%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Stanford and Mila Researchers Propose Hyena: An Attention-Free Drop-in Replacement to the Core Building Block of Many Large-Scale Language Models

May 8, 2023
in AI & Technology
Reading Time: 6 mins read
A A
Stanford and Mila Researchers Propose Hyena: An Attention-Free Drop-in Replacement to the Core Building Block of Many Large-Scale Language Models
ShareShareShareShareShare

As we all know that the race to develop and come up with mindblowing Generative models such as ChatGPT and Bard, and their underlying technology such as GPT3 and GPT4, has taken the AI world by magnanimous force, there are still many challenges when it comes to the accessibility, training and actual feasibility of these models in lots of use cases which pertains to our day to day problems. 

If anyone has ever played around with any of such sequence models, there is one sure-shot problem that might have ruined their excitement. That is, the length of input they can send in to prompt the model. 

If they are enthusiasts who want to dabble in the core of such technologies and train their custom model, the whole optimization process makes it quite an impossible task. 

🚀 JOIN the fastest ML Subreddit Community

At the heart of these problems lies the quadratic nature of the optimization of attention models that sequence models utilize. One of the biggest reasons is the computation cost of such algorithms and the resources needed to solve this issue. It can be an extremely expensive solution, especially if someone wants to scale it up, which leads to only a few concentrated organizations having a vivid sense of understanding and real control of such algorithms. 

Simply put, attention exhibits quadratic cost in sequence length. Limiting the amount of context accessible and scaling it is a costly affair. 

However, worry not; there is new architecture called the Hyena, which is now making waves in the NLP community, and people ordain it as the rescuer we all need. It challenges the dominance of the existing attention mechanisms, and the research paper demonstrates its potential to topple the existing system. 

Developed by a team of researchers at a leading university, Hyena boasts an impressive performance on a range of subquadratic NLP tasks in terms of optimization. In this article, we will look closely at Hyena’s claims.

This paper suggests that subquadratic operators can match the quality of attention models at scale without being that costly in terms of parameters and optimization cost. Based on targeted reasoning tasks, the authors distill the three most important properties contributing to its performance. 

  1. Data control
  2. Sublinear parameter scaling
  3. Unrestricted context. 

Aiming with these points in mind, they then introduce the Hyena hierarchy. This new operator combines long convolutions and element-wise multiplicative gating to match the quality of attention at scale while reducing the computational cost. 

The experiments conducted reveal mindblowing results. 

  1. Language modeling. 

Hyena’s scaling was tested on autoregressive language modeling, which, when evaluated on perplexity on benchmark dataset WikiText103 and The Pile, revealed that Hyena is the first attention-free, convolution architecture to match GPT quality with a 20% reduction in total FLOPS.

Perplexity on WikiText103 (same tokenizer). ∗ are results from (Dao et al., 2022c). Deeper and thinner models (Hyena-slim) achieve lower perplexity

Perplexity on The Pile for models trained until a total number of tokens e.g., 5 billion (different runs for each token total). All models use the same tokenizer (GPT2). FLOP count is for the 15 billion token run

  1. Large Scale image classification 

The paper demonstrates the potential of Hyena as a general deep-learning operator for image classification. On image translation, they drop-in replace attention layers in the Vision Transformer(ViT) with the Hyena operator and match the performance with ViT.

On CIFAR-2D, we test a 2D version of Hyena long convolution filters in a standard convolutional architecture, which improves on the 2D long convolutional model S4ND (Nguyen et al., 2022) in accuracy with an 8% speedup and 25% fewer parameters.

The promising results at the sub-billion parameter scale suggest that attention may not be all we need and that simpler subquadratic designs such as Hyena, informed by simple guiding principles and evaluation on mechanistic interpretability benchmarks, form the basis for efficient large models.

With the waves this architecture is creating in the community, it will be interesting to see if the Hyena would have the last laugh.


Check out the Paper and Github link. Don’t forget to join our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club


YOU MAY ALSO LIKE

Google Will Pay SpaceX $920 Million A Month To Use xAI’s Data Centers

Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents

Data Scientist currently working for S&P Global Market intelligence. Worked as data scientist for AI product startups. Reader and a learner at heart.


Credit: Source link

ShareTweetSendSharePin

Related Posts

Google Will Pay SpaceX 0 Million A Month To Use xAI’s Data Centers
AI & Technology

Google Will Pay SpaceX $920 Million A Month To Use xAI’s Data Centers

June 6, 2026
Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents
AI & Technology

Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents

June 6, 2026
NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time
AI & Technology

NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time

June 6, 2026
EA’s Star Wars Zero Company Drops August 27
AI & Technology

EA’s Star Wars Zero Company Drops August 27

June 6, 2026
Next Post
Bitcoin, Georgia, Tesla: Jim Cramer’s Stock Market Breakdown – Jan. 4

Bitcoin, Georgia, Tesla: Jim Cramer's Stock Market Breakdown - Jan. 4

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
AI agents are learning on the job — just not for your whole team

AI agents are learning on the job — just not for your whole team

June 5, 2026
Three shot dead at Islamic Center of San Diego, two suspects also dead

Three shot dead at Islamic Center of San Diego, two suspects also dead

June 4, 2026
Withings Launches A Cheaper Version Of Its Flagship Scale

Withings Launches A Cheaper Version Of Its Flagship Scale

June 2, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!