Florent Delgrange
Florent Delgrange
Home
Posts
Publications
Projects
CV
Light
Dark
Automatic
Posts
Deep SPI: Safe Policy Improvement via World Models
A long-form explainer of Deep SPI: why ordinary on-policy auxiliary losses break after policy updates, how world models and neighborhood-constrained updates fix this, and what the resulting algorithm buys on stochastic ALE-57.
Florent Delgrange
Last updated on May 2, 2026
10 min read
Reinforcement Learning
,
Safe Policy Improvement
,
World Models
Composing Reinforcement Learning Policies, with Formal Guarantees
Synthesizing controllers in large domains from verified world models and reinforcement learning policy composition.
Florent Delgrange
Last updated on May 22, 2025
13 min read
Cite
×