VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
要約
We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular videos, without any 3D ground truth. A pretrained 2D-to-3D lifter provides approximate 3D pose sequences that serve as a noisy teacher: these are diffused, d…