Architectural insights into and training methodology optimization of Pangu-Weather

<p>Data-driven medium-range weather forecasts have recently outperformed classical numerical weather prediction models, with Pangu-Weather (PGW) being the first breakthrough model to achieve this. The Transformer-based PGW introduced novel architectural components including the three-dimension...

Full description

Saved in:

Bibliographic Details
Main Authors:	D. To, J. Quinting, G. A. Hoshyaripour, M. Götz, A. Streit, C. Debus
Format:	Article
Language:	English
Published:	Copernicus Publications 2024-12-01
Series:	Geoscientific Model Development
Online Access:	https://gmd.copernicus.org/articles/17/8873/2024/gmd-17-8873-2024.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846125660063399936
author	D. To J. Quinting G. A. Hoshyaripour M. Götz M. Götz A. Streit C. Debus
author_facet	D. To J. Quinting G. A. Hoshyaripour M. Götz M. Götz A. Streit C. Debus
author_sort	D. To
collection	DOAJ
description	<p>Data-driven medium-range weather forecasts have recently outperformed classical numerical weather prediction models, with Pangu-Weather (PGW) being the first breakthrough model to achieve this. The Transformer-based PGW introduced novel architectural components including the three-dimensional attention mechanism (3D Transformer) in the Transformer blocks. Additionally, it features an Earth-specific positional bias term which accounts for weather states being related to the absolute position on Earth. However, the effectiveness of different architectural components is not yet well understood. Here, we reproduce the 24 h forecast model of PGW based on subsampled 6-hourly data. We then present an ablation study of PGW to better understand the sensitivity to the model architecture and training procedure. We find that using a two-dimensional attention mechanism (2D Transformer) yields a model that is more robust to training, converges faster, and produces better forecasts compared to using the 3D Transformer. The 2D Transformer reduces the overall computational requirements by 20 %–30 %. Further, the Earth-specific positional bias term can be replaced with a relative bias, reducing the model size by nearly 40 %. A sensitivity study comparing the convergence of the PGW model and the 2D-Transformer model shows large batch effects; however, the 2D-Transformer model is more robust to such effects. Lastly, we propose a new training procedure that increases the speed of convergence for the 2D-Transformer model by 30 % without any further hyperparameter tuning.</p>
format	Article
id	doaj-art-f99173ad8cfe45169c71bce43b7a1aac
institution	Kabale University
issn	1991-959X 1991-9603
language	English
publishDate	2024-12-01
publisher	Copernicus Publications
record_format	Article
series	Geoscientific Model Development
spelling	doaj-art-f99173ad8cfe45169c71bce43b7a1aac2024-12-13T10:46:13ZengCopernicus PublicationsGeoscientific Model Development1991-959X1991-96032024-12-01178873888410.5194/gmd-17-8873-2024Architectural insights into and training methodology optimization of Pangu-WeatherD. To0J. Quinting1G. A. Hoshyaripour2M. Götz3M. Götz4A. Streit5C. Debus6Scientific Computing Center (SCC), Karlsruhe Institute of Technology (KIT), Karlsruhe, GermanyInstitute of Meteorology and Climate Research, Karlsruhe Institute of Technology (KIT), Karlsruhe, GermanyInstitute of Meteorology and Climate Research, Karlsruhe Institute of Technology (KIT), Karlsruhe, GermanyScientific Computing Center (SCC), Karlsruhe Institute of Technology (KIT), Karlsruhe, GermanyHelmholtz AI, Karlsruhe, GermanyScientific Computing Center (SCC), Karlsruhe Institute of Technology (KIT), Karlsruhe, GermanyScientific Computing Center (SCC), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany<p>Data-driven medium-range weather forecasts have recently outperformed classical numerical weather prediction models, with Pangu-Weather (PGW) being the first breakthrough model to achieve this. The Transformer-based PGW introduced novel architectural components including the three-dimensional attention mechanism (3D Transformer) in the Transformer blocks. Additionally, it features an Earth-specific positional bias term which accounts for weather states being related to the absolute position on Earth. However, the effectiveness of different architectural components is not yet well understood. Here, we reproduce the 24 h forecast model of PGW based on subsampled 6-hourly data. We then present an ablation study of PGW to better understand the sensitivity to the model architecture and training procedure. We find that using a two-dimensional attention mechanism (2D Transformer) yields a model that is more robust to training, converges faster, and produces better forecasts compared to using the 3D Transformer. The 2D Transformer reduces the overall computational requirements by 20 %–30 %. Further, the Earth-specific positional bias term can be replaced with a relative bias, reducing the model size by nearly 40 %. A sensitivity study comparing the convergence of the PGW model and the 2D-Transformer model shows large batch effects; however, the 2D-Transformer model is more robust to such effects. Lastly, we propose a new training procedure that increases the speed of convergence for the 2D-Transformer model by 30 % without any further hyperparameter tuning.</p>https://gmd.copernicus.org/articles/17/8873/2024/gmd-17-8873-2024.pdf
spellingShingle	D. To J. Quinting G. A. Hoshyaripour M. Götz M. Götz A. Streit C. Debus Architectural insights into and training methodology optimization of Pangu-Weather Geoscientific Model Development
title	Architectural insights into and training methodology optimization of Pangu-Weather
title_full	Architectural insights into and training methodology optimization of Pangu-Weather
title_fullStr	Architectural insights into and training methodology optimization of Pangu-Weather
title_full_unstemmed	Architectural insights into and training methodology optimization of Pangu-Weather
title_short	Architectural insights into and training methodology optimization of Pangu-Weather
title_sort	architectural insights into and training methodology optimization of pangu weather
url	https://gmd.copernicus.org/articles/17/8873/2024/gmd-17-8873-2024.pdf
work_keys_str_mv	AT dto architecturalinsightsintoandtrainingmethodologyoptimizationofpanguweather AT jquinting architecturalinsightsintoandtrainingmethodologyoptimizationofpanguweather AT gahoshyaripour architecturalinsightsintoandtrainingmethodologyoptimizationofpanguweather AT mgotz architecturalinsightsintoandtrainingmethodologyoptimizationofpanguweather AT mgotz architecturalinsightsintoandtrainingmethodologyoptimizationofpanguweather AT astreit architecturalinsightsintoandtrainingmethodologyoptimizationofpanguweather AT cdebus architecturalinsightsintoandtrainingmethodologyoptimizationofpanguweather

Architectural insights into and training methodology optimization of Pangu-Weather

Similar Items