Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model

Transformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distributi...

Full description

Saved in:

Bibliographic Details
Main Authors:	Le Hoang Anh, Dang Thanh Vu, Seungmin Oh, Gwang-Hyun Yu, Nguyen Bui Ngoc Han, Hyoung-Gook Kim, Jin-Sul Kim, Jin-Young Kim
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Energies
Subjects:	multivariate time series forecasting transfer learning frequency analysis
Online Access:	https://www.mdpi.com/1996-1073/17/24/6452
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846104884570488832
author	Le Hoang Anh Dang Thanh Vu Seungmin Oh Gwang-Hyun Yu Nguyen Bui Ngoc Han Hyoung-Gook Kim Jin-Sul Kim Jin-Young Kim
author_facet	Le Hoang Anh Dang Thanh Vu Seungmin Oh Gwang-Hyun Yu Nguyen Bui Ngoc Han Hyoung-Gook Kim Jin-Sul Kim Jin-Young Kim
author_sort	Le Hoang Anh
collection	DOAJ
description	Transformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distribution drift. However, the use of variate tokens prohibits masked model pretraining, as masking an entire series is absurd. To close this gap, we propose LSPatch-T (Long–Short Patch Transfer), a framework that transfers knowledge from short-length patch tokens into full-length variate tokens. A key implementation is that we selectively transfer a portion of the Transformer encoder to ensure the linear design of the downstream model. Additionally, we introduce a robust frequency loss to maintain consistency across different temporal ranges. The experimental results show that our approach outperforms Transformer-based baselines (Transformer, Informer, Crossformer, Autoformer, PatchTST, iTransformer) on three public datasets (ETT, Exchange, Weather), which is a promising step forward in generalizing time series forecasting models.
format	Article
id	doaj-art-d530d197d3824d7c9bf29f95312dd168
institution	Kabale University
issn	1996-1073
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Energies
spelling	doaj-art-d530d197d3824d7c9bf29f95312dd1682024-12-27T14:23:53ZengMDPI AGEnergies1996-10732024-12-011724645210.3390/en17246452Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting ModelLe Hoang Anh0Dang Thanh Vu1Seungmin Oh2Gwang-Hyun Yu3Nguyen Bui Ngoc Han4Hyoung-Gook Kim5Jin-Sul Kim6Jin-Young Kim7Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaResearch Center, AISeed Inc., Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of KoreaDepartment of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaDepartment of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of KoreaTransformer-based time series forecasting models use patch tokens for temporal patterns and variate tokens to learn covariates’ dependencies. While patch tokens inherently facilitate self-supervised learning, variate tokens are more suitable for linear forecasters as they help to mitigate distribution drift. However, the use of variate tokens prohibits masked model pretraining, as masking an entire series is absurd. To close this gap, we propose LSPatch-T (Long–Short Patch Transfer), a framework that transfers knowledge from short-length patch tokens into full-length variate tokens. A key implementation is that we selectively transfer a portion of the Transformer encoder to ensure the linear design of the downstream model. Additionally, we introduce a robust frequency loss to maintain consistency across different temporal ranges. The experimental results show that our approach outperforms Transformer-based baselines (Transformer, Informer, Crossformer, Autoformer, PatchTST, iTransformer) on three public datasets (ETT, Exchange, Weather), which is a promising step forward in generalizing time series forecasting models.https://www.mdpi.com/1996-1073/17/24/6452multivariate time series forecastingtransfer learningfrequency analysis
spellingShingle	Le Hoang Anh Dang Thanh Vu Seungmin Oh Gwang-Hyun Yu Nguyen Bui Ngoc Han Hyoung-Gook Kim Jin-Sul Kim Jin-Young Kim Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model Energies multivariate time series forecasting transfer learning frequency analysis
title	Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_full	Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_fullStr	Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_full_unstemmed	Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_short	Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model
title_sort	partial transfer learning from patch transformer to variate based linear forecasting model
topic	multivariate time series forecasting transfer learning frequency analysis
url	https://www.mdpi.com/1996-1073/17/24/6452
work_keys_str_mv	AT lehoanganh partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT dangthanhvu partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT seungminoh partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT gwanghyunyu partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT nguyenbuingochan partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT hyounggookkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT jinsulkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel AT jinyoungkim partialtransferlearningfrompatchtransformertovariatebasedlinearforecastingmodel

Partial Transfer Learning from Patch Transformer to Variate-Based Linear Forecasting Model

Similar Items