Model merging refers to the process of combining multiple distinct models, each designed to perform separate tasks or solve different problems, into a single unified model without requiring additional training. Depending on the specific technique and goal, merging models can also be called ensemble learning, model blending, or model stacking. This technique aims to create a more versatile and comprehensive Machine Learning model capable of handling various tasks simultaneously.
In the context of LLMs, model merging can involve combining LLMs with different initializations, architectures, or training on different tasks. The primary goal is to leverage the strengths of each individual model and create a multi-task LLM that can address a broader range of tasks. This approach can significantly improve performance and efficiency by allowing the combined model to benefit from the knowledge and capabilities of each constituent model.
Why merge ML models?
Combining Machine Learning models offers several benefits, such as reducing prediction variability and bias through averaging or voting among diverse models. Leveraging complex patterns and features from various data sources and models can enhance prediction accuracy and adaptability. Moreover, model merging can improve prediction diversity and reliability by reducing reliance on a single dataset or algorithm.
Model merging results in better performance, improved efficiency, and broader applicability, making it a valuable strategy for leveraging the strengths of different AI models without the need for extensive additional training.
Strategies for combining LLMs
One common approach is to combine models by averaging their weights or parameters. This can result in a fused model that benefits from the knowledge and expertise embedded in each original model. Model merging may also involve the integration of features from each model. This is particularly useful when the models have learned task-specific features that are valuable for the overall performance of the merged model.
Some model merging techniques allow for merging models up to a specified layer, creating a multi-head model. This approach can be beneficial when different models specialize in different aspects of a task.
In this research, the authors acknowledge that pretrained models are widely used as a starting point for natural language processing tasks but can be expensive to create. They propose a novel approach of fusing multiple existing fine-tuned models into one, using an average of their weights. This fused model consistently outperforms pretrained models and is often superior to intertraining, where a base model is fine-tuned on another task. The fusion process is less dependent on the target task and remains effective even with weight decay, providing a more cost-effective and resource-efficient method for improving model initialization in NLP.
Transfer learning, which involves further fine-tuning pre-trained models for downstream tasks, offers improved performance, faster convergence, and sample efficiency. However, task-specific fine-tuned models often cannot collaborate effectively. Model merging methods have emerged to address this, but they frequently neglect interference between parameters from different models, causing performance drops. In response, the authors propose TIES-MERGING, which resolves interference issues by resetting parameters, resolving sign conflicts, and merging only compatible parameters. TIES-MERGING outperforms existing methods across diverse settings, emphasizing the importance of addressing interference in model merging for enhanced performance and versatility.
This research addresses the challenge of merging distinct models with different initializations, each trained for a separate task, into a single multi-task model without additional training. While previous model merging methods work for models trained on the same task, they fall short when combining models trained for different tasks. The authors introduce “ZipIt,” a general merging method for arbitrary models with the same architecture to overcome this limitation. ZipIt incorporates two key strategies: first, it allows for merging features within each model to account for non-shared features, and second, it supports partial merging up to a specified layer, creating a multi-head model. These innovations result in a significant 20-60% improvement over previous methods, enabling the effective merging of models trained on disparate tasks.
Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
References:
I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.
Credit: Source link