BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors

Arxiv 2024
Tingyang Zhang*1, Qingzhe Gao*2,1, Weiyu Li3, Libin Liu1, Baoquan Chen1,
*: Joint first authors with equal contributions to this work
1Peking Unviersity, 2Shandong Unviersity, 3The Hong Kong University of Science and Technology

Abstract

Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints.We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods.
BAGS Project Teaser Image

Given a single casual video, our method constructs an animatable 3D Gaussian Splatting model with diffusion priors. This not only compensates for unseen view information but also enables fast training and real-time rendering.

BAGS Project Teaser Image

Pipeline: we construct a canonical space using Gaussian Splatting. In the absence of a templated parametric model, we develop a neural bones representation to animate the canonical space to match the input video. Additionally, we utilize a diffusion model to address unseen view information and apply a rigid constraint to facilitate training. After training, the model can be manually manipulated to achieve novel pose rendering.

Video

Comparisons

Input
Banmo
Ours

BibTeX

@misc{zhang2024bags,
          title={BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors}, 
          author={Tingyang Zhang and Qingzhe Gao and Weiyu Li and Libin Liu and Baoquan Chen},
          year={2024},
          eprint={2403.11427},
          archivePrefix={arXiv},
          primaryClass={cs.CV}
      }