Composing RL policies, with formal guarantees

Implementation of the techniques presented in our paper Composing Reinforcement Learning Policies, with Formal Guarantees. The project also includes WAE-DQN, an RL algorithm learning a discrete and verifiable world model along with its policy, and new “two-level” environments:

The two environments come with low- and high-level variants. In the low-level variant, the agent is placed in a room of the two-level environment and its goal is to reach the exit safely, by avoiding moving obstacles. In the high-level variant, the goal of the agent is to navigate safely through the rooms composing the environment to reach a target location.

Related