ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

2026-04-21Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors address the challenge of creating videos of humans that look good and move realistically by first focusing on making high-quality images of humans. They then use these images to help generate videos that keep the appearance consistent while showing different poses and camera angles. Their method combines existing image and video tools with a model that guides human movement. They also provide a new dataset and extra tools for making human images. All their code and data are publicly shared.

human video generationappearance modelingpose controlviewpoint controlSMPL-Xtemporal consistencyimage generationvideo diffusion modelmulti-view datacompositional image synthesis
Authors
Zhengwentai Sun, Keru Zheng, Chenghong Li, Hongjie Liao, Xihe Yang, Heyuan Li, Yihao Zhi, Shuliang Ning, Shuguang Cui, Xiaoguang Han
Abstract
Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view data. Existing methods often address these factors separately, resulting in limited controllability or reduced visual quality. We revisit this problem from an image-first perspective, where high-quality human appearance is learned via image generation and used as a prior for video synthesis, decoupling appearance modeling from temporal consistency. We propose a pose- and viewpoint-controllable pipeline that combines a pretrained image backbone with SMPL-X-based motion guidance, together with a training-free temporal refinement stage based on a pretrained video diffusion model. Our method produces high-quality, temporally consistent videos under diverse poses and viewpoints. We also release a canonical human dataset and an auxiliary model for compositional human image synthesis. Code and data are publicly available at https://github.com/Taited/ReImagine.