Large language models (LLMs) have gained a massive amount of attention in the recent months. These models mimic humans by answering questions relevantly, generating precise content, translating languages, summarizing long textual paragraphs, and completing code samples. LLMs have been developing quickly, with regular releases of potent models showcasing excellent performance in code generation tasks. Researchers have looked into several techniques, including supervised fine-tuning, instruction tuning, reinforcement learning, and others, to improve the capacity of pre-trained code LLMs to generate code.
In a recent study, a team of researchers from Huawei Cloud Co., Ltd., Chinese Academy of Science, and Peking University introduced a unique framework called RRTF (Rank Responses to align Test&Teacher Feedback), which successfully and efficiently enhances pre-trained large language models for code production. The RRTF framework has been developed with the intention of improving Code LLMs’ performance in code generation activities. It uses natural language LLM alignment techniques and rates feedback rather than utilizing absolute reward values.
The Reinforcement Learning from Human Feedback approach, which provides models like InstructGPT or ChatGPT with a simpler and more effective training approach by using ranking responses as feedback instead of absolute reward values, serves as inspiration for this novel approach, which applies natural language LLM alignment techniques to Code LLMs. As a result of applying the RRTF framework, the team has also introduced the PanGu-Coder2 model, which achieves an outstanding 62.20% pass rate at the top-1 position on the OpenAI HumanEval benchmark.
By using the approach on StarCoder 15B, the team has exceeded PanGu-Coder and achieved the best performance of all documented Code LLMs, proving the usefulness of RRTF. Comprehensive analyses of three benchmarks—HumanEval, CoderEval, and LeetCode—have indicated that Code LLMs may be able to outperform natural language models of the same or greater sizes in code creation tasks. The study also emphasizes the value of high-quality data in enhancing models’ ability to follow instructions and write code.
The team has summarized the contributions as follows –
- The RRTF Optimisation Paradigm has been introduced, which has a number of benefits that make it a model-neutral, straightforward, and data-efficient approach.
- The PanGu-Coder2 model has also been introduced. By about 30%, PanGu-Coder2 greatly beats its original model. HumanEval, CoderEval, and LeetCode are a few of the benchmarks that show this significant speed gain.
- PanGu-Coder2 outperforms all previously released Code LLMs in terms of code generation, achieving new state-of-the-art achievements.
- The team has discussed their ideas and practical knowledge on building good training data for code generation.
- The PanGu-Coder2 model has been trained using the RRTF framework, and the team has offered helpful insights into this process.
- In addition to improving the code generation efficiency, the team has suggested optimization methods used by PanGu-Coder2 to guarantee quick inference. This field’s findings help create realistic deployment scenarios because efficient inference is essential for real-world applications.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.
edge with data: Actionable market intelligence for global brands, retailers, analysts, and investors. (Sponsored)
Credit: Source link