CTDE2: Continuous Training Discrete Execution

Nicolas Rowies, Florent Delgrange, Ann Nowé, Diederik M. Roijers

Abstract

Multi-agent reinforcement learning is challenging, especially when combined with continuous action spaces. To tackle this challenge, we introduce the Continuous Training with Discrete Execution (CTDE2) paradigm. The key insight behind this is that at execution time, the agents do not typically require the full infinite action space, but can rely on the smaller set of actions strictly required by the optimal policy. Furthermore, in practice agents often will only need a compact learned set of discrete actions (i.e. codebook). As such, we can learn which codebook is needed, and make both policy execution and learning much faster. In this paper, we propose the MQ-LAN algorithm which operationalizes CTDE2 by replacing exhaustive sampling of the action space with a learnable codebook of discrete action embeddings. The candidate actions can dynamically migrate across the continuous value landscape via gradient ascent. Empirical results confirm that treating the discretization as a learnable parameter allows MQ-LAN to achieve strong computational efficiency and superior returns compared to rigid discrete baselines, Actor-Critic methods, and Q-functionals. Based on these results, we believe that CTDE2 is a highly scalable and efficient pathway to tackle continuous multi-agent RL.

Type

Conference paper

Venue

VenueALA

Publication

Proceedings of the Adaptive and Learning Agents Workshop (ALA 2026), Paphos, Cyprus, May 25–26, 2026

Multi-Agent Reinforcement Learning Continuous Control Reinforcement Learning

CTDE2: Continuous Training Discrete Execution

Abstract

Related