論文深掘り Hugging Face 発表: 2026-06-01 HF ↑10

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

著者: Runpeng Dai, Tong Zheng, Rui Liu, Chengsong Huang, Hongtu Zhu

要約

Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically deciding when to stop sampling, yet they typically rely on heuristic rules …

#llm#rl

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合