論文 Hugging Face 発表: 2026-06-07 HF ↑37

Latent Spatial Memory for Video World Models

Latent Spatial Memory for Video World Models

著者: Weijie Wang, Haoyu Zhao, Yifan Yang, Feng Chen, Zeyu Zhang ほか5名

要約

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel …

#diffusion#coding

同じカテゴリの記事