Mix-Spectrum for Generalization in Visual Reinforcement Learning

Visual Reinforcement Learning (RL) trains agents on policies using images showing the potential for real-world applications. However, the limited diversity in the training environment often results in overfitting with agents underperforming in unseen environments. To address this issue, image augmen...

Full description

Saved in:
Bibliographic Details
Main Authors: Jeong Woon Lee, Hyoseok Hwang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10833629/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841536204610207744
author Jeong Woon Lee
Hyoseok Hwang
author_facet Jeong Woon Lee
Hyoseok Hwang
author_sort Jeong Woon Lee
collection DOAJ
description Visual Reinforcement Learning (RL) trains agents on policies using images showing the potential for real-world applications. However, the limited diversity in the training environment often results in overfitting with agents underperforming in unseen environments. To address this issue, image augmentation is utilized in visual RL to increase data diversity, but the effectiveness is limited due to the potential to alter the semantic information of the image. Therefore, we introduce Mix-Spectrum, a straightforward yet highly effective frequency-based augmentation method that maintains the semantic consistency of data and enhances the agent’s focus on semantic information. The proposed method combines two existing methods: mixing amplitudes of original and reference images, and Random Convolution. Through this synergistic combination of established methods, our approach not only maintains the advantages of each method but also introduces a novel characteristic that enhances performance. Furthermore, the proposed method stands out for adaptability when integrated with any visual RL algorithm, whether off-policy or on-policy. Through extensive experiments on the DMControl Generalization Benchmark (DMControl-GB) and Procgen, our method demonstrates superior performance compared to existing frequency-based, normalization-based, and image augmentation methods in zero-shot generalization. In DMControl-GB, our method improved by 35.5% over the baseline and 15.2% over the second-best. In Procgen, it achieved 15.2% and 10.1% improvements, respectively.
format Article
id doaj-art-bd7ecf6e952e4511bd90a32c1a1198d1
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-bd7ecf6e952e4511bd90a32c1a1198d12025-01-15T00:02:58ZengIEEEIEEE Access2169-35362025-01-01137939795010.1109/ACCESS.2025.352695910833629Mix-Spectrum for Generalization in Visual Reinforcement LearningJeong Woon Lee0https://orcid.org/0009-0006-5862-0124Hyoseok Hwang1https://orcid.org/0000-0003-3241-8455Department of Software Convergence, Kyung Hee University, Yongin, Gyeonggi, Republic of KoreaDepartment of Software Convergence, Kyung Hee University, Yongin, Gyeonggi, Republic of KoreaVisual Reinforcement Learning (RL) trains agents on policies using images showing the potential for real-world applications. However, the limited diversity in the training environment often results in overfitting with agents underperforming in unseen environments. To address this issue, image augmentation is utilized in visual RL to increase data diversity, but the effectiveness is limited due to the potential to alter the semantic information of the image. Therefore, we introduce Mix-Spectrum, a straightforward yet highly effective frequency-based augmentation method that maintains the semantic consistency of data and enhances the agent’s focus on semantic information. The proposed method combines two existing methods: mixing amplitudes of original and reference images, and Random Convolution. Through this synergistic combination of established methods, our approach not only maintains the advantages of each method but also introduces a novel characteristic that enhances performance. Furthermore, the proposed method stands out for adaptability when integrated with any visual RL algorithm, whether off-policy or on-policy. Through extensive experiments on the DMControl Generalization Benchmark (DMControl-GB) and Procgen, our method demonstrates superior performance compared to existing frequency-based, normalization-based, and image augmentation methods in zero-shot generalization. In DMControl-GB, our method improved by 35.5% over the baseline and 15.2% over the second-best. In Procgen, it achieved 15.2% and 10.1% improvements, respectively.https://ieeexplore.ieee.org/document/10833629/Deep reinforcement learningdata augmentationfast Fourier transforms
spellingShingle Jeong Woon Lee
Hyoseok Hwang
Mix-Spectrum for Generalization in Visual Reinforcement Learning
IEEE Access
Deep reinforcement learning
data augmentation
fast Fourier transforms
title Mix-Spectrum for Generalization in Visual Reinforcement Learning
title_full Mix-Spectrum for Generalization in Visual Reinforcement Learning
title_fullStr Mix-Spectrum for Generalization in Visual Reinforcement Learning
title_full_unstemmed Mix-Spectrum for Generalization in Visual Reinforcement Learning
title_short Mix-Spectrum for Generalization in Visual Reinforcement Learning
title_sort mix spectrum for generalization in visual reinforcement learning
topic Deep reinforcement learning
data augmentation
fast Fourier transforms
url https://ieeexplore.ieee.org/document/10833629/
work_keys_str_mv AT jeongwoonlee mixspectrumforgeneralizationinvisualreinforcementlearning
AT hyoseokhwang mixspectrumforgeneralizationinvisualreinforcementlearning