• kpk ETH Primekpk ETH Prime(KPK ETH PRIME)$2,034.900.01%
  • bitcoinBitcoin(BTC)$70,328.00-0.31%
  • ethereumEthereum(ETH)$2,065.41-0.22%
  • kpk ETH Yieldkpk ETH Yield(KPK ETH YIELD)$2,030.62-0.04%
  • tetherTether(USDT)$1.000.00%
  • binancecoinBNB(BNB)$650.62-0.42%
  • rippleXRP(XRP)$1.37-1.33%
  • usd-coinUSDC(USDC)$1.000.01%
  • solanaSolana(SOL)$86.51-1.06%
  • tronTRON(TRX)$0.288938-0.73%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.030.58%
  • dogecoinDogecoin(DOGE)$0.0943310.61%
  • whitebitWhiteBIT Coin(WBT)$55.38-0.62%
  • USDSUSDS(USDS)$1.000.00%
  • cardanoCardano(ADA)$0.262080-1.00%
  • bitcoin-cashBitcoin Cash(BCH)$454.33-0.59%
  • HyperliquidHyperliquid(HYPE)$37.212.43%
  • leo-tokenLEO Token(LEO)$9.07-1.54%
  • moneroMonero(XMR)$352.490.25%
  • chainlinkChainlink(LINK)$8.99-0.93%
  • Ethena USDeEthena USDe(USDE)$1.000.01%
  • CantonCanton(CC)$0.147163-3.27%
  • stellarStellar(XLM)$0.159551-0.44%
  • USD1USD1(USD1)$1.00-0.05%
  • RainRain(RAIN)$0.0090810.72%
  • daiDai(DAI)$1.000.01%
  • litecoinLitecoin(LTC)$54.25-1.37%
  • avalanche-2Avalanche(AVAX)$9.63-0.23%
  • paypal-usdPayPal USD(PYUSD)$1.000.02%
  • hedera-hashgraphHedera(HBAR)$0.094136-0.80%
  • suiSui(SUI)$0.97-1.32%
  • shiba-inuShiba Inu(SHIB)$0.0000061.44%
  • zcashZcash(ZEC)$208.39-2.50%
  • the-open-networkToncoin(TON)$1.31-2.15%
  • crypto-com-chainCronos(CRO)$0.075374-0.63%
  • tether-goldTether Gold(XAUT)$5,060.34-1.45%
  • World Liberty FinancialWorld Liberty Financial(WLFI)$0.100950-0.74%
  • MemeCoreMemeCore(M)$1.472.94%
  • pax-goldPAX Gold(PAXG)$5,094.80-1.59%
  • polkadotPolkadot(DOT)$1.51-1.51%
  • Pi NetworkPi Network(PI)$0.2577759.28%
  • uniswapUniswap(UNI)$3.90-0.54%
  • mantleMantle(MNT)$0.712.85%
  • Circle USYCCircle USYC(USYC)$1.120.00%
  • BittensorBittensor(TAO)$212.225.78%
  • BlackRock USD Institutional Digital Liquidity FundBlackRock USD Institutional Digital Liquidity Fund(BUIDL)$1.000.00%
  • okbOKB(OKB)$94.67-1.21%
  • SkySky(SKY)$0.0811171.94%
  • Global DollarGlobal Dollar(USDG)$1.000.00%
  • Falcon USDFalcon USD(USDF)$1.00-0.05%
TradePoint.io
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop
No Result
View All Result
TradePoint.io
No Result
View All Result

This AI Paper from Anthropic Introduces Attribution Graphs: A New Interpretability Method to Trace Internal Reasoning in Claude 3.5 Haiku

April 6, 2025
in AI & Technology
Reading Time: 4 mins read
A A
This AI Paper from Anthropic Introduces Attribution Graphs: A New Interpretability Method to Trace Internal Reasoning in Claude 3.5 Haiku
ShareShareShareShareShare

While the outputs of large language models (LLMs) appear coherent and useful, the underlying mechanisms guiding these behaviors remain largely unknown. As these models are increasingly deployed in sensitive and high-stakes environments, it has become crucial to understand what they do and how they do it.

The main challenge lies in uncovering the internal steps that lead a model to a specific response. The computations happen across hundreds of layers and billions of parameters, making it difficult to isolate the processes involved. Without a clear understanding of these steps, trusting or debugging their behavior becomes harder, especially in tasks requiring reasoning, planning, or factual reliability. Researchers are thus focused on reverse-engineering these models to identify how information flows and decisions are made internally.

YOU MAY ALSO LIKE

Another longtime Microsoft executive is retiring

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Existing interpretability methods like attention maps and feature attribution offer partial views into model behavior. While these tools help highlight which input tokens contribute to outputs, they often fail to trace the full chain of reasoning or identify intermediate steps. Moreover, these tools usually focus on surface-level behaviors and do not provide consistent insight into deeper computational structures. This has created the need for more structured, fine-grained methods to trace logic through internal representations over multiple steps.

To address this, researchers from Anthropic introduced a new technique called attribution graphs. These graphs allow researchers to trace the internal flow of information between features within a model during a single forward pass. By doing so, they attempt to identify intermediate concepts or reasoning steps that are not visible from the model’s outputs alone. The attribution graphs generate hypotheses about the computational pathways a model follows, which are then tested using perturbation experiments. This approach marks a significant step toward revealing the “wiring diagram” of large models, much like how neuroscientists map brain activity.

The researchers applied attribution graphs to Claude 3.5 Haiku, a lightweight language model released by Anthropic in October 2024. The method begins by identifying interpretable features activated by a specific input. These features are then traced to determine their influence on the final output. For example, when prompted with a riddle or poem, the model selects a set of rhyming words before writing lines, a form of planning. In another example, the model identifies “Texas” as an intermediate step to answer the question, “What’s the capital of the state containing Dallas?” which it correctly resolves as “Austin.” The graphs reveal the model outputs and how it internally represents and transitions between ideas.

The performance results from attribution graphs uncovered several advanced behaviors within Claude 3.5 Haiku. In poetry tasks, the model pre-plans rhyming words before composing each line, showing anticipatory reasoning. In multi-hop questions, the model forms internal intermediate representations, such as associating Dallas with Texas before determining Austin as the answer. It leverages both language-specific and abstract circuits for multilingual inputs, with the latter becoming more prominent in Claude 3.5 Haiku than in earlier models. Further, the model generates diagnoses internally in medical reasoning tasks and uses them to inform follow-up questions. These findings suggest that the model can abstract planning, internal goal-setting, and stepwise logical deductions without explicit instruction.

This research presents attribution graphs as a valuable interpretability tool that reveals the hidden layers of reasoning in language models. By applying this method, the team from Anthropic has shown that models like Claude 3.5 Haiku don’t merely mimic human responses—they compute through layered, structured steps. This opens the door to deeper audits of model behavior, allowing more transparent and responsible deployment of advanced AI systems.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]


Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.

Credit: Source link

ShareTweetSendSharePin

Related Posts

Another longtime Microsoft executive is retiring
AI & Technology

Another longtime Microsoft executive is retiring

March 12, 2026
The team behind continuous batching says your idle GPUs should be running inference, not sitting dark
AI & Technology

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

March 12, 2026
JBL’s two new Live headphones offer 80 hours of battery each
AI & Technology

JBL’s two new Live headphones offer 80 hours of battery each

March 12, 2026
Google Play will let you try a game before you buy it
AI & Technology

Google Play will let you try a game before you buy it

March 12, 2026
Next Post
Judge temporarily blocks deportation of Georgetown University researcher

Judge temporarily blocks deportation of Georgetown University researcher

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

No Result
View All Result
Crowd topples Ayatollah monument in southern Iran

Crowd topples Ayatollah monument in southern Iran

March 8, 2026
The #1 Thing I Wish I Knew Before Becoming A Dividend Investor

The #1 Thing I Wish I Knew Before Becoming A Dividend Investor

March 8, 2026
Sinkhole swallows up cars at Nebraska intersection

Sinkhole swallows up cars at Nebraska intersection

March 11, 2026

About

Learn more

Our Services

Legal

Privacy Policy

Terms of Use

Bloggers

Learn more

Article Links

Contact

Advertise

Ask us anything

©2020- TradePoint.io - All rights reserved!

Tradepoint.io, being just a publishing and technology platform, is not a registered broker-dealer or investment adviser. So we do not provide investment advice. Rather, brokerage services are provided to clients of Tradepoint.io by independent SEC-registered broker-dealers and members of FINRA/SIPC. Every form of investing carries some risk and past performance is not a guarantee of future results. “Tradepoint.io“, “Instant Investing” and “My Trading Tools” are registered trademarks of Apperbuild, LLC.

This website is operated by Apperbuild, LLC. We have no link to any brokerage firm and we do not provide investment advice. Every information and resource we provide is solely for the education of our readers. © 2020 Apperbuild, LLC. All rights reserved.

No Result
View All Result
  • Main
  • AI & Technology
  • Stock Charts
  • Market & News
  • Business
  • Finance Tips
  • Trade Tube
  • Blog
  • Shop

© 2023 - TradePoint.io - All Rights Reserved!