Florent Delgrange
Florent Delgrange
Home
Posts
Publications
Projects
CV
Contact
Light
Dark
Automatic
Reinforcement Learning
Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees @ BNAIC/BeNeLearn 2022
Nov 7, 2022 — Nov 9, 2022
Mechelen, Belgium
Florent Delgrange
,
Ann Nowé
,
Guillermo A. Pérez
Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Florent Delgrange
,
Ann Nowé
,
Guillermo A. Pérez
A Framework for Flexibly Guiding Learning Agents
Mahmoud Elbarbari
,
Florent Delgrange
,
Ivo Vervlimmeren
,
Kyriakos Efthymiadis
,
Bram Vanderborght
,
Ann Nowé
VAE-MDPs
Source code for replicating the expriments presented in the paper
Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes
Safe Reinforcement Learning
Oct 18, 2018 12:00 AM
UMONS -- Université de Mons, Belgium
«
Cite
×