World Model Self-Distillation: Training World Models to Solve General Tasks
World Model Self-Distillation: Training World Models to Solve General Tasks
要約
Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed textual descriptions limits their direct use for planning and decision-making. Existing approaches either outsource this reasoning to language or vision-lan…