Florent Delgrange
Florent Delgrange
Home
Posts
Publications
Projects
CV
Contact
Light
Dark
Automatic
Reactive Synthesis
Composing Reinforcement Learning Policies, with Formal Guarantees
We propose a novel framework to controller design in environments with a two-level structure: a known high-level graph (“map”) in which each vertex is populated by a Markov decision process, called a “room”. The framework “separates concerns” by using different design techniques for low- and high-level tasks. We apply reactive synthesis for high-level tasks: given a specification as a logical formula over the high-level graph and a collection of low-level policies obtained together with “concise” latent structures, we construct a “planner” that selects which low-level policy to apply in each room. We develop a reinforcement learning procedure to train low-level policies on latent structures, which unlike previous approaches, circumvents a model distillation step. We pair the policy with probably approximately correct guarantees on its performance and on the abstraction quality, and lift these guarantees to the high-level task. These formal guarantees are the main advantage of the framework. Other advantages include scalability (rooms are large and their dynamics are unknown) and reusability of low-level policies. We demonstrate feasibility in challenging case studies where an agent navigates environments with moving obstacles and visual inputs.
Florent Delgrange
,
Guy Avni
,
Anna Lukina
,
Christian Schilling
,
Ann Nowé
,
Guillermo A. Pérez
Composing RL policies, with formal guarantees
Source code for replicating the experiments presented in our paper
Composing Reinforcement Learning Policies, with Formal Guarantees
Florent Delgrange
,
Guy Avny
,
Anna Lukina
,
Christian Schilling
,
Guillermo A. Pérez
,
Ann Nowé
Composing Reinforcement Learning Policies, with Formal Guarantees
Synthesizing controllers in large domains from verified world models and reinforcement learning policy composition.
Florent Delgrange
Last updated on May 13, 2025
14 min read
Reactive Synthesis
,
Reinforcement Learning
,
Representation Learning
Life is Random, Time is Not: Markov Decision Processes with Window Objectives
Thomas Brihaye
,
Florent Delgrange
,
Youssouf Oualhadj
,
Mickael Randour
Life Is Random, Time Is Not: Markov Decision Processes with Window Objectives
Thomas Brihaye
,
Florent Delgrange
,
Youssouf Oualhadj
,
Mickael Randour
Cite
×