Reactive Synthesis

VenueAAMAS Composing Reinforcement Learning Policies, with Formal Guarantees

We propose a novel framework to controller design in environments with a two-level structure: a known high-level graph (“map”) in which each vertex is populated by a Markov decision process, called a “room”. The framework “separates concerns” by using different design techniques for low- and high-level tasks. We apply reactive synthesis for high-level tasks: given a specification as a logical formula over the high-level graph and a collection of low-level policies obtained together with “concise” latent structures, we construct a “planner” that selects which low-level policy to apply in each room. We develop a reinforcement learning procedure to train low-level policies on latent structures, which unlike previous approaches, circumvents a model distillation step. We pair the policy with probably approximately correct guarantees on its performance and on the abstraction quality, and lift these guarantees to the high-level task. These formal guarantees are the main advantage of the framework. Other advantages include scalability (rooms are large and their dynamics are unknown) and reusability of low-level policies. We demonstrate feasibility in challenging case studies where an agent navigates environments with moving obstacles and visual inputs.

Florent Delgrange, Guy Avni, Anna Lukina, Christian Schilling, Ann Nowé, Guillermo A. Pérez

Composing Reinforcement Learning Policies, with Formal Guarantees

Composing RL policies, with formal guarantees

Source code for replicating the experiments presented in our paper Composing Reinforcement Learning Policies, with Formal Guarantees

Florent Delgrange, Guy Avny, Anna Lukina, Christian Schilling, Guillermo A. Pérez, Ann Nowé

Composing RL policies, with formal guarantees

VenueLMCS Life is Random, Time is Not: Markov Decision Processes with Window Objectives

Thomas Brihaye, Florent Delgrange, Youssouf Oualhadj, Mickael Randour

VenueCONCUR Life Is Random, Time Is Not: Markov Decision Processes with Window Objectives

Thomas Brihaye, Florent Delgrange, Youssouf Oualhadj, Mickael Randour