Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model

Transformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distributi...

Full description

Saved in:
Bibliographic Details
Main Authors: Le Hoang Anh, Dang Thanh Vu, Seungmin Oh, Gwang-Hyun Yu, Nguyen Bui Ngoc Han, Hyoung-Gook Kim, Jin-Sul Kim, Jin-Young Kim
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Energies
Subjects:
Online Access:https://www.mdpi.com/1996-1073/17/24/6452
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1846104884570488832
author Le Hoang Anh
Dang Thanh Vu
Seungmin Oh
Gwang-Hyun Yu
Nguyen Bui Ngoc Han
Hyoung-Gook Kim
Jin-Sul Kim
Jin-Young Kim
author_facet Le Hoang Anh
Dang Thanh Vu
Seungmin Oh
Gwang-Hyun Yu
Nguyen Bui Ngoc Han
Hyoung-Gook Kim
Jin-Sul Kim
Jin-Young Kim
author_sort Le Hoang Anh
collection DOAJ
description Transformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distribution drift. However, the use of variate tokens prohibits masked model pretraining, as masking an entire series is absurd. To close this gap, we propose LSPatch-T (Long–Short Patch Transfer), a framework that transfers knowledge from short-length patch tokens into full-length variate tokens. A key implementation is that we selectively transfer a portion of the Transformer encoder to ensure the linear design of the downstream model. Additionally, we introduce a robust frequency loss to maintain consistency across different temporal ranges. The experimental results show that our approach outperforms Transformer-based baselines (Transformer, Informer, Crossformer, Autoformer, PatchTST, iTransformer) on three public datasets (ETT, Exchange, Weather), which is a promising step forward in generalizing time series forecasting models.
format Article
id doaj-art-d530d197d3824d7c9bf29f95312dd168
institution Kabale University
issn 1996-1073
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Energies
spelling doaj-art-d530d197d3824d7c9bf29f95312dd1682024-12-27T14:23:53ZengMDPI AGEnergies1996-10732024-12-011724645210.3390/en17246452Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting ModelLe Hoang Anh0Dang Thanh Vu1Seungmin Oh2Gwang-Hyun Yu3Nguyen Bui Ngoc Han4Hyoung-Gook Kim5Jin-Sul Kim6Jin-Young Kim7Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaResearch Center, AISeed Inc., Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of KoreaDepartment of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaTransformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distribution drift. However, the use of variate tokens prohibits masked model pretraining, as masking an entire series is absurd. To close this gap, we propose LSPatch-T (Long–Short Patch Transfer), a framework that transfers knowledge from short-length patch tokens into full-length variate tokens. A key implementation is that we selectively transfer a portion of the Transformer encoder to ensure the linear design of the downstream model. Additionally, we introduce a robust frequency loss to maintain consistency across different temporal ranges. The experimental results show that our approach outperforms Transformer-based baselines (Transformer, Informer, Crossformer, Autoformer, PatchTST, iTransformer) on three public datasets (ETT, Exchange, Weather), which is a promising step forward in generalizing time series forecasting models.https://www.mdpi.com/1996-1073/17/24/6452multivariate time series forecastingtransfer learningfrequency analysis
spellingShingle Le Hoang Anh
Dang Thanh Vu
Seungmin Oh
Gwang-Hyun Yu
Nguyen Bui Ngoc Han
Hyoung-Gook Kim
Jin-Sul Kim
Jin-Young Kim
Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
Energies
multivariate time series forecasting
transfer learning
frequency analysis
title Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_full Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_fullStr Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_full_unstemmed Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_short Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_sort partial transfer learning from patch transformer to variate based linear forecasting model
topic multivariate time series forecasting
transfer learning
frequency analysis
url https://www.mdpi.com/1996-1073/17/24/6452
work_keys_str_mv AT lehoanganh partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel
AT dangthanhvu partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel
AT seungminoh partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel
AT gwanghyunyu partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel
AT nguyenbuingochan partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel
AT hyounggookkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel
AT jinsulkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel
AT jinyoungkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel