Florent Delgrange
Florent Delgrange
Home
Publications
Projects
CV
Contact
Light
Dark
Automatic
Markov Decision Processes
WAE-MDPs
Source code for replicating the expriments presented in the paper
Wasserstein Auto-encoded MDPs — Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees
VAE-MDPs
Source code for replicating the expriments presented in the paper
Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes
Life is Random, Time is Not: Markov Decision Processes with Window Objectives
Thomas Brihaye
,
Florent Delgrange
,
Youssouf Oualhadj
,
Mickael Randour
Simple Strategies in Multi-Objective MDPs
We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining the Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case can be reduced to the stationary one by a product construction. Experimental results using Storm and Gurobi show the feasibility of our algorithms.
Florent Delgrange
,
Joost-Pieter Katoen
,
Tim Quatmann
,
Mickael Randour
Safe Reinforcement Learning
Oct 18, 2018 12:00 AM
UMONS -- Université de Mons, Belgium
Cite
×