Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes

Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the...

Full description

Saved in:

Bibliographic Details
Main Authors:	SAMUEL TESFAZGI, Leonhard Sprandl, Armin Lederer, Sandra Hirche
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Open Journal of Control Systems
Subjects:	Control Lyapunov function convex optimization inverse optimality inverse reinforcement learning learning from demonstrations sum of squares
Online Access:	https://ieeexplore.ieee.org/document/10643266/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1841554020719656960
author	SAMUEL TESFAZGI Leonhard Sprandl Armin Lederer Sandra Hirche
author_facet	SAMUEL TESFAZGI Leonhard Sprandl Armin Lederer Sandra Hirche
author_sort	SAMUEL TESFAZGI
collection	DOAJ
description	Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.
format	Article
id	doaj-art-2e02bac571874fc288201e50c2e9df78
institution	Kabale University
issn	2694-085X
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of Control Systems
spelling	doaj-art-2e02bac571874fc288201e50c2e9df782025-01-09T00:03:07ZengIEEEIEEE Open Journal of Control Systems2694-085X2024-01-01335837410.1109/OJCSYS.2024.344746410643266Stable Inverse Reinforcement Learning: Policies From Control Lyapunov LandscapesSAMUEL TESFAZGI0https://orcid.org/0009-0000-7298-6073Leonhard Sprandl1https://orcid.org/0009-0007-8147-1363Armin Lederer2https://orcid.org/0000-0001-6263-5608Sandra Hirche3https://orcid.org/0000-0001-7819-5926Chair of Information-oriented Control, TUM School of Computation, Information and Technology, Technical University of Munich, Munich, GermanyChair of Information-oriented Control, TUM School of Computation, Information and Technology, Technical University of Munich, Munich, GermanyLearning and Adaptive Systems Group, Department of Computer Science, ETH Zurich, Zurich, SwitzerlandChair of Information-oriented Control, TUM School of Computation, Information and Technology, Technical University of Munich, Munich, GermanyLearning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.https://ieeexplore.ieee.org/document/10643266/Control Lyapunov functionconvex optimizationinverse optimalityinverse reinforcement learninglearning from demonstrationssum of squares
spellingShingle	SAMUEL TESFAZGI Leonhard Sprandl Armin Lederer Sandra Hirche Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes IEEE Open Journal of Control Systems Control Lyapunov function convex optimization inverse optimality inverse reinforcement learning learning from demonstrations sum of squares
title	Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes
title_full	Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes
title_fullStr	Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes
title_full_unstemmed	Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes
title_short	Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes
title_sort	stable inverse reinforcement learning policies from control lyapunov landscapes
topic	Control Lyapunov function convex optimization inverse optimality inverse reinforcement learning learning from demonstrations sum of squares
url	https://ieeexplore.ieee.org/document/10643266/
work_keys_str_mv	AT samueltesfazgi stableinversereinforcementlearningpoliciesfromcontrollyapunovlandscapes AT leonhardsprandl stableinversereinforcementlearningpoliciesfromcontrollyapunovlandscapes AT arminlederer stableinversereinforcementlearningpoliciesfromcontrollyapunovlandscapes AT sandrahirche stableinversereinforcementlearningpoliciesfromcontrollyapunovlandscapes

Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes

Similar Items