[1] Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing*, Yang Fei*, Yingqing He*†, Jingye Chen, Jiaxin Xie, Xiaowei Chi, Qifeng Chen†
arXiv preprint, 2024
Learning a robust video Variational Autoencoder (VAE) is critical for efficient video generation and compression. This paper introduces a novel video autoencoder that achieves high-fidelity video encoding by combining temporal-aware spatial compression, lightweight motion compression, and textual guidance from text-to-video datasets. The model also supports joint training on images and videos, enhancing versatility and reconstruction quality.
* Joint first authors
† Corresponding authors