How Transformer-Based LLMs Extract Knowledge From Their Parameters

In recent years, transformer-based large language models (LLMs) have become very popular because of their ability to capture and store factual knowledge. However, how these models extract factual associations during inference remains relatively underexplored. A recent study by researchers from Google DeepMind, Tel Aviv University, and Google Research aimed to examine the internal mechanisms by which transformer-based LLMs store and extract factual associations.

The study proposed an information flow approach to investigate how the model predicts the correct attribute and how internal representations evolve across layers to generate outputs. Specifically, the researchers focused on decoder-only LLMs and identified critical computational points related to the relation and subject positions. They achieved this by using a “knock out” strategy to block the last position from attending to other positions at specific layers, then observing the impacts during inference.

To further pinpoint locations where attribute extraction occurs, the researchers analyzed the information propagating at these critical points and the preceding representation construction process. They achieved this through additional interventions to the vocabulary and the model’s multi-head self-attention (MHSA) and multi-layer perceptron (MLP) sublayers and projections.

🚀 JOIN the fastest ML Subreddit Community

The researchers identified an internal mechanism for attribute extraction based on a subject enrichment process and an attribute extraction operation. Specifically, information about the subject is enriched in the last subject token across early layers of the model, while the relation is passed to the last token. Finally, the last token uses the relation to extract the corresponding attributes from the subject representation via attention head parameters.

The findings offer insights into how factual associations are stored and extracted internally in LLMs. The researchers believe these findings could open new research directions for knowledge localization and model editing. For example, the study’s approach could be used to identify the internal mechanisms by which LLMs acquire and store biased information and to develop methods for mitigating such biases.

Overall, this study highlights the importance of examining the internal mechanisms by which transformer-based LLMs store and extract factual associations. By understanding these mechanisms, researchers can develop more effective methods for improving model performance and reducing biases. Additionally, the study’s approach could be applied to other areas of natural language processing, such as sentiment analysis and language translation, to understand better how these models operate internally.

Check out the Paper. Don’t forget to join our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.

Credit: Source link