• bitcoinBitcoin(BTC)$60,840.00-0.61%
  • ethereumEthereum(ETH)$1,568.21-1.55%
  • tetherTether(USDT)$1.00-0.01%
  • binancecoinBNB(BNB)$574.45-0.25%
  • usd-coinUSDC(USDC)$1.000.00%
  • rippleXRP(XRP)$1.10-0.95%
  • solanaSolana(SOL)$62.39-3.39%
  • tronTRON(TRX)$0.3234320.86%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.46%
  • HyperliquidHyperliquid(HYPE)$57.26-5.73%
  • dogecoinDogecoin(DOGE)$0.082012-0.72%
  • USDSUSDS(USDS)$1.00-0.02%
  • leo-tokenLEO Token(LEO)$9.46-1.29%
  • RainRain(RAIN)$0.012914-1.81%
  • stellarStellar(XLM)$0.2133904.76%
  • CantonCanton(CC)$0.1648859.88%
  • zcashZcash(ZEC)$369.62-5.62%
  • cardanoCardano(ADA)$0.157732-1.49%
  • moneroMonero(XMR)$293.63-5.83%
  • chainlinkChainlink(LINK)$7.41-0.09%
  • whitebitWhiteBIT Coin(WBT)$43.31-1.16%
  • USD1USD1(USD1)$1.000.04%
  • Ethena USDeEthena USDe(USDE)$1.00-0.03%
  • the-open-networkToncoin(TON)$1.636.89%
  • bitcoin-cashBitcoin Cash(BCH)$217.022.33%
  • daiDai(DAI)$1.00-0.04%
  • LABLAB(LAB)$12.9947.67%
  • MemeCoreMemeCore(M)$3.015.21%
  • hedera-hashgraphHedera(HBAR)$0.079825-1.53%
  • litecoinLitecoin(LTC)$41.36-5.57%
  • suiSui(SUI)$0.72-0.23%
  • avalanche-2Avalanche(AVAX)$6.69-2.46%
  • paypal-usdPayPal USD(PYUSD)$1.000.01%
  • Circle USYCCircle USYC(USYC)$1.130.00%
  • shiba-inuShiba Inu(SHIB)$0.000005-0.58%
  • tether-goldTether Gold(XAUT)$4,289.33-0.45%
  • crypto-com-chainCronos(CRO)$0.0586060.49%
  • Global DollarGlobal Dollar(USDG)$1.00-0.01%
  • nearNEAR Protocol(NEAR)$1.88-4.88%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • Ondo US Dollar YieldOndo US Dollar Yield(USDY)$1.132.08%
  • pax-goldPAX Gold(PAXG)$4,296.43-0.76%
  • BittensorBittensor(TAO)$194.02-1.81%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.055628-1.19%
  • mantleMantle(MNT)$0.52-1.55%
  • Ripple USDRipple USD(RLUSD)$1.000.03%
  • AsterAster(ASTER)$0.620.51%
  • polkadotPolkadot(DOT)$0.94-1.37%
  • OndoOndo(ONDO)$0.326835-5.73%
  • HTX DAOHTX DAO(HTX)$0.0000020.65%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

Researchers find that retraining only small parts of AI models can cut costs and prevent forgetting

October 13, 2025
in AI & Technology
Reading Time: 3 mins read
A A
Researchers find that retraining only small parts of AI models can cut costs and prevent forgetting
ShareShareShareShareShare

Enterprises often find that when they fine-tune models, one effective approach to making a large language model (LLM) fit for purpose and grounded in data is to have the model lose some of its abilities. After fine-tuning, some models “forget” how to perform certain tasks or other tasks they already learned. 

YOU MAY ALSO LIKE

Databricks CEO: We Don’t Need AI To Get Smarter

SaaSpocalypse Is Overblown, Says Okta CEO

Research from the University of Illinois Urbana-Champaign proposes a new method for retraining models that avoids “catastrophic forgetting,” in which the model loses some of its prior knowledge. The paper focuses on two specific LLMs that generate responses from images: LLaVA and Qwen 2.5-VL.

The approach encourages enterprises to retrain only narrow parts of an LLM to avoid retraining the entire model and incurring a significant increase in compute costs. The team claims that catastrophic forgetting isn’t true memory loss, but rather a side effect of bias drift. 

“Training a new LMM can cost millions of dollars, weeks of time, and emit hundreds of tons of CO2, so finding ways to more efficiently and effectively update existing models is a pressing concern,” the team wrote in the paper. “Guided by this result, we explore tuning recipes that preserve learning while limiting output shift.”

The researchers focused on a multi-layer perceptron (MLP), the model's internal decision-making component. 

Catastrophic forgetting 

The researchers wanted first to verify the existence and the cause of catastrophic forgetting in models. 

To do this, they created a set of target tasks for the models to complete. The models were then fine-tuned and evaluated to determine whether they led to substantial forgetting. But as the process went on, the researchers found that the models were recovering some of their abilities. 

“We also noticed a surprising result, that the model performance would drop significantly in held out benchmarks after training on the counting task, it would mostly recover on PathVQA, another specialized task that is not well represented in the benchmarks,” they said. “Meanwhile, while performing the forgetting mitigation experiments, we also tried separately tuning only the self-attention projection (SA Proj) or MLP layers, motivated by the finding that tuning only the LLM was generally better than tuning the full model. This led to another very surprising result – that tuning only self-attention projection layers led to very good learning of the target tasks with no drop in performance in held out tasks, even after training all five target tasks in a sequence.”

The researchers said they believe that “what looks like forgetting or interference after fine-tuning on a narrow target task is actually bias in the output distribution due to the task distribution shift.”

Narrow retraining

That finding turned out to be the key to the experiment. The researchers noted that tuning the MLP increases the likelihood of “outputting numeric tokens and a highly correlated drop in held out task accuracy.” What it showed is that a model forgetting some of its knowledge is only temporary and not a long-term matter. 

“To avoid biasing the output distribution, we tune the MLP up/gating projections while keeping the down projection frozen, and find that it achieves similar learning to full MLP tuning with little forgetting,” the researchers said. 

This allows for a more straightforward and more reproducible method for fine-tuning a model. 

By focusing on a narrow segment of the model, rather than a wholesale retraining, enterprises can cut compute costs. It also allows better control of output drift. 

However, the research focuses only on two models, specifically those dealing with vision and language. The researchers noted that due to limited resources, they are unable to try the experiment with other models.

Their findings, however, can be extended to other LLMs, especially for different modalities. 

Credit: Source link

ShareTweetSendSharePin

Related Posts

Databricks CEO: We Don’t Need AI To Get Smarter
AI & Technology

Databricks CEO: We Don’t Need AI To Get Smarter

June 7, 2026
SaaSpocalypse Is Overblown, Says Okta CEO
AI & Technology

SaaSpocalypse Is Overblown, Says Okta CEO

June 7, 2026
AI Has Become Major Capital Formation Cycle, Says Altimeter
AI & Technology

AI Has Become Major Capital Formation Cycle, Says Altimeter

June 6, 2026
Anthropic’s Ethicist on Whether AI Can Become Conscious
AI & Technology

Anthropic’s Ethicist on Whether AI Can Become Conscious

June 6, 2026
Next Post
Microsoft debuts its first in-house AI image generator

Microsoft debuts its first in-house AI image generator

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Bob Brooks wins Democratic primary for Pennsylvania’s 7th Congressional District

Bob Brooks wins Democratic primary for Pennsylvania’s 7th Congressional District

June 2, 2026
Orbital Data Centers Face Space-Based Challenges

Orbital Data Centers Face Space-Based Challenges

June 6, 2026
AI News Got So Wild I Had to Build a Map to Keep up!

AI News Got So Wild I Had to Build a Map to Keep up!

June 5, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!