Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling
要約
Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, many studies have ob…