Continual deep reinforcement learning with task-agnostic policy distillation

Abstract Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various meth...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Burhan Hafez, Kerim Erekmen
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-80774-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559460579901440
author Muhammad Burhan Hafez
Kerim Erekmen
author_facet Muhammad Burhan Hafez
Kerim Erekmen
author_sort Muhammad Burhan Hafez
collection DOAJ
description Abstract Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)–(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled for further exploration. Therefore, the agent acts in a self-supervised manner by systematically seeking novel states. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently, leading to improved sample efficiency. Our code is available at the repository: https://github.com/wabbajack1/TAPD .
format Article
id doaj-art-b15608c57ef54d09bfac8b602c6f32e5
institution Kabale University
issn 2045-2322
language English
publishDate 2024-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-b15608c57ef54d09bfac8b602c6f32e52025-01-05T12:28:28ZengNature PortfolioScientific Reports2045-23222024-12-0114111710.1038/s41598-024-80774-8Continual deep reinforcement learning with task-agnostic policy distillationMuhammad Burhan Hafez0Kerim Erekmen1School of Electronics and Computer Science, University of SouthamptonDepartment of Informatics, University of HamburgAbstract Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)–(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled for further exploration. Therefore, the agent acts in a self-supervised manner by systematically seeking novel states. By utilizing task-agnostic distilled knowledge, the agent can solve downstream tasks more efficiently, leading to improved sample efficiency. Our code is available at the repository: https://github.com/wabbajack1/TAPD .https://doi.org/10.1038/s41598-024-80774-8Continual learningReinforcement learningSelf-supervised learningTask-agnostic learning
spellingShingle Muhammad Burhan Hafez
Kerim Erekmen
Continual deep reinforcement learning with task-agnostic policy distillation
Scientific Reports
Continual learning
Reinforcement learning
Self-supervised learning
Task-agnostic learning
title Continual deep reinforcement learning with task-agnostic policy distillation
title_full Continual deep reinforcement learning with task-agnostic policy distillation
title_fullStr Continual deep reinforcement learning with task-agnostic policy distillation
title_full_unstemmed Continual deep reinforcement learning with task-agnostic policy distillation
title_short Continual deep reinforcement learning with task-agnostic policy distillation
title_sort continual deep reinforcement learning with task agnostic policy distillation
topic Continual learning
Reinforcement learning
Self-supervised learning
Task-agnostic learning
url https://doi.org/10.1038/s41598-024-80774-8
work_keys_str_mv AT muhammadburhanhafez continualdeepreinforcementlearningwithtaskagnosticpolicydistillation
AT kerimerekmen continualdeepreinforcementlearningwithtaskagnosticpolicydistillation