I am a PhD student in the AI lab of the Vrije Universiteit Brussel (VUB), under the supervision of Ann Nowé (AI Lab, VUB) and Guillermo A. Pérez (University of Antwerp). My research interests lie in the fields of artificial intelligence and formal verification. More specifically, my PhD focuses on the formal verification of single- and multi-agent policies obtained through reinforcement learning. The end goal of my research is to provide end-users with reliable AI mechanisms.
PhD in Computer Science
Vrije Universiteit Brussel (VUB), Belgium
Master in Computer Science, 2018
University of Mons (UMONS), Belgium
Bachelor in Computer Science, 2016
UMONS, Belgium
Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability.
Cite Code Project DOI URL Extended Abstract Technical Report