Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization
Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization
要約
Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual constructio…