Continual deep reinforcement learning with task-agnostic policy distillation

Abstract Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various meth...

Full description

Saved in:

Bibliographic Details
Main Authors:	Muhammad Burhan Hafez, Kerim Erekmen
Format:	Article
Language:	English
Published:	Nature Portfolio 2024-12-01
Series:	Scientific Reports
Subjects:	Continual learning Reinforcement learning Self-supervised learning Task-agnostic learning
Online Access:	https://doi.org/10.1038/s41598-024-80774-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841559460579901440
author	Muhammad Burhan Hafez Kerim Erekmen
author_facet	Muhammad Burhan Hafez Kerim Erekmen
author_sort	Muhammad Burhan Hafez
collection	DOAJ
description	Abstract Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)–(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled for further exploration. Therefore, the agent acts in a self-supervised manner by systematically seeking novel states. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently, leading to improved sample efficiency. Our code is available at the repository: https://github.com/wabbajack1/TAPD .
format	Article
id	doaj-art-b15608c57ef54d09bfac8b602c6f32e5
institution	Kabale University
issn	2045-2322
language	English
publishDate	2024-12-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-b15608c57ef54d09bfac8b602c6f32e52025-01-05T12:28:28ZengNature PortfolioScientific Reports2045-23222024-12-0114111710.1038/s41598-024-80774-8Continual deep reinforcement learning with task-agnostic policy distillationMuhammad Burhan Hafez0Kerim Erekmen1School of Electronics and Computer Science, University of SouthamptonDepartment of Informatics, University of HamburgAbstract Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)–(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled for further exploration. Therefore, the agent acts in a self-supervised manner by systematically seeking novel states. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently, leading to improved sample efficiency. Our code is available at the repository: https://github.com/wabbajack1/TAPD .https://doi.org/10.1038/s41598-024-80774-8Continual learningReinforcement learningSelf-supervised learningTask-agnostic learning
spellingShingle	Muhammad Burhan Hafez Kerim Erekmen Continual deep reinforcement learning with task-agnostic policy distillation Scientific Reports Continual learning Reinforcement learning Self-supervised learning Task-agnostic learning
title	Continual deep reinforcement learning with task-agnostic policy distillation
title_full	Continual deep reinforcement learning with task-agnostic policy distillation
title_fullStr	Continual deep reinforcement learning with task-agnostic policy distillation
title_full_unstemmed	Continual deep reinforcement learning with task-agnostic policy distillation
title_short	Continual deep reinforcement learning with task-agnostic policy distillation
title_sort	continual deep reinforcement learning with task agnostic policy distillation
topic	Continual learning Reinforcement learning Self-supervised learning Task-agnostic learning
url	https://doi.org/10.1038/s41598-024-80774-8
work_keys_str_mv	AT muhammadburhanhafez continualdeepreinforcementlearningwithtaskagnosticpolicydistillation AT kerimerekmen continualdeepreinforcementlearningwithtaskagnosticpolicydistillation

Continual deep reinforcement learning with task-agnostic policy distillation

Similar Items