Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
Transformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distributi...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Energies |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1996-1073/17/24/6452 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1846104884570488832 |
|---|---|
| author | Le Hoang Anh Dang Thanh Vu Seungmin Oh Gwang-Hyun Yu Nguyen Bui Ngoc Han Hyoung-Gook Kim Jin-Sul Kim Jin-Young Kim |
| author_facet | Le Hoang Anh Dang Thanh Vu Seungmin Oh Gwang-Hyun Yu Nguyen Bui Ngoc Han Hyoung-Gook Kim Jin-Sul Kim Jin-Young Kim |
| author_sort | Le Hoang Anh |
| collection | DOAJ |
| description | Transformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distribution drift. However, the use of variate tokens prohibits masked model pretraining, as masking an entire series is absurd. To close this gap, we propose LSPatch-T (Long–Short Patch Transfer), a framework that transfers knowledge from short-length patch tokens into full-length variate tokens. A key implementation is that we selectively transfer a portion of the Transformer encoder to ensure the linear design of the downstream model. Additionally, we introduce a robust frequency loss to maintain consistency across different temporal ranges. The experimental results show that our approach outperforms Transformer-based baselines (Transformer, Informer, Crossformer, Autoformer, PatchTST, iTransformer) on three public datasets (ETT, Exchange, Weather), which is a promising step forward in generalizing time series forecasting models. |
| format | Article |
| id | doaj-art-d530d197d3824d7c9bf29f95312dd168 |
| institution | Kabale University |
| issn | 1996-1073 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Energies |
| spelling | doaj-art-d530d197d3824d7c9bf29f95312dd1682024-12-27T14:23:53ZengMDPI AGEnergies1996-10732024-12-011724645210.3390/en17246452Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting ModelLe Hoang Anh0Dang Thanh Vu1Seungmin Oh2Gwang-Hyun Yu3Nguyen Bui Ngoc Han4Hyoung-Gook Kim5Jin-Sul Kim6Jin-Young Kim7Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaResearch Center, AISeed Inc., Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of KoreaDepartment of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaTransformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distribution drift. However, the use of variate tokens prohibits masked model pretraining, as masking an entire series is absurd. To close this gap, we propose LSPatch-T (Long–Short Patch Transfer), a framework that transfers knowledge from short-length patch tokens into full-length variate tokens. A key implementation is that we selectively transfer a portion of the Transformer encoder to ensure the linear design of the downstream model. Additionally, we introduce a robust frequency loss to maintain consistency across different temporal ranges. The experimental results show that our approach outperforms Transformer-based baselines (Transformer, Informer, Crossformer, Autoformer, PatchTST, iTransformer) on three public datasets (ETT, Exchange, Weather), which is a promising step forward in generalizing time series forecasting models.https://www.mdpi.com/1996-1073/17/24/6452multivariate time series forecastingtransfer learningfrequency analysis |
| spellingShingle | Le Hoang Anh Dang Thanh Vu Seungmin Oh Gwang-Hyun Yu Nguyen Bui Ngoc Han Hyoung-Gook Kim Jin-Sul Kim Jin-Young Kim Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model Energies multivariate time series forecasting transfer learning frequency analysis |
| title | Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model |
| title_full | Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model |
| title_fullStr | Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model |
| title_full_unstemmed | Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model |
| title_short | Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model |
| title_sort | partial transfer learning from patch transformer to variate based linear forecasting model |
| topic | multivariate time series forecasting transfer learning frequency analysis |
| url | https://www.mdpi.com/1996-1073/17/24/6452 |
| work_keys_str_mv | AT lehoanganh partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT dangthanhvu partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT seungminoh partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT gwanghyunyu partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT nguyenbuingochan partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT hyounggookkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT jinsulkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT jinyoungkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel |