(CVPR2023)DiffPose: Toward More Reliable 3D Pose Estimation 论文阅读

3D human pose estimation, which aims to predict the 3D coordinates of human joints from images or videos. The mainstream approach is to conduct 3D pose estimation in two stages: the 2D pose is first obtained with a 2D pose detector, and then 2D-to-3D lifting is performed.

Inspired by the strong capability of diffusion models to generate realistic samples even from a starting point with high uncertainty (e.g., random noise), here we aim to tackle 3D pose estimation with diffusion models.

DiffPose models the 3D pose estimation procedure as a reverse diffusion process, where we progressively transform a 3D pose distribution with high uncertainty and indeterminacy towards a 3D pose with low uncertainty.

Problem

We start the reverse diffusion process from an estimated 2D pose which has high uncertainty in 3D space, instead of starting from random noise.

Method

Firstly, we initialize the indeterminate 3D pose distribution based on extracted heatmaps and also the data distribution in the training set.

Secondly, during forward diffusion, to generate the indeterminate 3D pose distributions that eventually (after steps) resemble , we add noise to the ground truth 3D pose distribution , where the noise is modeled by a Gaussian Mixture Model (GMM) that characterizes the uncertainty distribution .

Thirdly, the reverse diffusion process is conditioned on context information.

Method

Initializing 3D Pose Distribution

To aid our diffusion model in handling the uncertainty and indeterminacy of each input 2D pose in 3D space, we would like to initialize a corresponding 3D pose distribution that captures the uncertainty of the 3D pose.

We use the corresponding heatmaps from the off-the-shelf 2D pose detector as the and distribution. We compute the distribution by calculating the occurrence frequencies of values in the training data.

Forward Pose Diffusion

For DiffPose, we do not want to diffuse our 3D pose towards a standard Gaussian noise.

💡

如果直接从 GT 的点扩散到采样到的初始化点, 那么最后扩散的结果是一个以该点为均值的高斯分布, 丢失了分布的方差信息. 因此, 这里使用 GMM 建模 , 从而做到向分布扩散.

We propose to use a Gaussian Mixture Model (GMM) to model the uncertainty distribution . Specifically, we set the number of Gaussian components in the GMM at , and use the Expectation-Maximization (EM) algorithm to optimize the GMM parameters to fit the target distribution as follows:

Next, we want to run the forward diffusion process on the ground truth pose distribution such that after steps, the generated noisy distribution becomes equivalent to the fitted GMM distribution , which we henceforth denote as because it is a GMM-based representation of .

Overall Training and Testing Process

Training

initialize ;

use and to generate supervisory signals via the forward process;

run steps of the reverse process starting from and optimize with our generated signals.

Test

initialize ;

run steps of the reverse process starting from to obtain final prediction .

Loss