SFGMesh

Optimizing vertex deformations from video for mesh animation is constrained by rendering-based reconstruction losses. While existing approaches improve mesh representations, supervision signals, or animation paradigms, their supervision remains confined to the 2D domain. Such 2D supervision is limited: it provides no signal for occluded regions and only indirect cues for visible areas. Consequently, these methods often suffer from severe shape and motion artifacts. To address this limitation, we propose Shape Flow Guidance (SFG), a sequence of 3D shapes derived from videos, which serves as explicit 3D supervision for mesh animation. Specifically, SFG is elicited by intervening in the sampling process of a pretrained mesh generator in a training-free manner. We further tailor a skeletal animation model that separates local deformation from global transformation. This model enables SFG to guide complex local motion while reserving rendering-based losses for simple global motion. Extensive experiments confirm that our method significantly outperforms prior work qualitatively, quantitatively, and in terms of processing speed.