Posts

Deep SPI: Safe Policy Improvement via World Models

A long-form explainer of Deep SPI: why ordinary on-policy auxiliary losses break after policy updates, how world models and neighborhood-constrained updates fix this, and what the resulting algorithm buys on stochastic ALE-57.

Florent Delgrange

Last updated on May 6, 2026 10 min read

Deep SPI: Safe Policy Improvement via World Models

Composing Reinforcement Learning Policies, with Formal Guarantees

Synthesizing controllers in large domains from verified world models and reinforcement learning policy composition.

Florent Delgrange

Last updated on May 22, 2025 13 min read

Composing Reinforcement Learning Policies, with Formal Guarantees