[1] Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Yang Fei, George Stoica, Jingyuan Liu, Qifeng Chen†, Ranjay Krishna, Xiaojuan Wang*, Benlin Liu*†
Under Review
Paper | Page
[2] Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing*, Yang Fei*, Yingqing He*†, Jingye Chen, Jiaxin Xie, Xiaowei Chi, Qifeng Chen†
International Conference on Computer Vision (ICCV), 2025
Learning a robust video Variational Autoencoder (VAE) is critical for efficient video generation and compression. This paper introduces a novel video autoencoder that achieves high-fidelity video encoding by combining temporal-aware spatial compression, lightweight motion compression, and textual guidance from text-to-video datasets. The model also supports joint training on images and videos, enhancing versatility and reconstruction quality.
* Equal contribution
† Corresponding authors