Composing RL policies, with formal guarantees

Implementation of the techniques presented in our paper Composing Reinforcement Learning Policies, with Formal Guarantees. The project also includes WAE-DQN, an RL algorithm learning a discrete and verifiable world model along with its policy, and new “two-level” environments:
- A large, parameterizable grid world with moving obstacles
- A 8-room A ViZDoom scenario with ennemies randomly spawning on the map at regular interval.
The two environments come with low- and high-level variants. In the low-level variant, the agent is placed in a room of the two-level environment and its goal is to reach the exit safely, by avoiding moving obstacles. In the high-level variant, the goal of the agent is to navigate safely through the rooms composing the environment to reach a target location.