Publications
Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Under Review
TL;DR: We introduce an algorithm to distill structure-preserving motion priors from an autoregressive video tracking model (SAM2) into a bidirectional video diffusion model.
Large Motion Video Autoencoding with Cross-modal Video VAE
International Conference on Computer Vision (ICCV), 2025
TL;DR: We propose a video autoencoder that achieves high-fidelity video encoding by combining temporal-aware spatial compression, lightweight temporal compression, and textual guidance.