Prefix Tuning Using Residual Reparameterization

Fine-tuning large language models for specific tasks requires updating and storing all parameters, leading to significant computational and storage cost issues. To address these challenges, parameter-efficient learning such as prefix tuning has gained attention. However, prefix tuning can suffer fro...

Full description

Saved in:
Bibliographic Details
Main Authors: Youngjun Jung, Hyunsun Hwang, Changki Lee
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10938609/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fine-tuning large language models for specific tasks requires updating and storing all parameters, leading to significant computational and storage cost issues. To address these challenges, parameter-efficient learning such as prefix tuning has gained attention. However, prefix tuning can suffer from sensitivity to prefix length. This is considered to be due to different prefix tokens being required for each task. In this paper, we propose improving the robustness and performance of prefix tuning through residual reparameterization. We add residual connections to the prefix module, providing more flexibility to the model. Additionally, we propose a gate mechanism to assign weights to prefix tokens, allowing for focus on more important tokens. Our experiments on the GLUE benchmark and E2E dataset demonstrate that our methods lead to improved and stabilized performance across various prefix lengths. The residual connections enable faster convergence during training, while the gate mechanism helps balance prefix tokens and find more optimized parameters. Our approach shows particular effectiveness when combining residual connections with the gate mechanism, outperforming original prefix tuning, especially with longer prefix lengths, while remaining parameter-efficient. We also provide an analysis of gate weight trends during training, offering insights into how the model uses prefix tokens for different prefix lengths.
ISSN:2169-3536