Prefix Tuning Using Residual Reparameterization
Fine-tuning large language models for specific tasks requires updating and storing all parameters, leading to significant computational and storage cost issues. To address these challenges, parameter-efficient learning such as prefix tuning has gained attention. However, prefix tuning can suffer fro...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10938609/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Fine-tuning large language models for specific tasks requires updating and storing all parameters, leading to significant computational and storage cost issues. To address these challenges, parameter-efficient learning such as prefix tuning has gained attention. However, prefix tuning can suffer from sensitivity to prefix length. This is considered to be due to different prefix tokens being required for each task. In this paper, we propose improving the robustness and performance of prefix tuning through residual reparameterization. We add residual connections to the prefix module, providing more flexibility to the model. Additionally, we propose a gate mechanism to assign weights to prefix tokens, allowing for focus on more important tokens. Our experiments on the GLUE benchmark and E2E dataset demonstrate that our methods lead to improved and stabilized performance across various prefix lengths. The residual connections enable faster convergence during training, while the gate mechanism helps balance prefix tokens and find more optimized parameters. Our approach shows particular effectiveness when combining residual connections with the gate mechanism, outperforming original prefix tuning, especially with longer prefix lengths, while remaining parameter-efficient. We also provide an analysis of gate weight trends during training, offering insights into how the model uses prefix tokens for different prefix lengths. |
|---|---|
| ISSN: | 2169-3536 |