(CVPR2023)DiffPose: Toward More Reliable 3D Pose Estimation 论文阅读

date
Dec 10, 2022
Last edited time
Jun 3, 2023 12:09 PM
status
Published
slug
DiffPose论文阅读
tags
DL
DDPM
summary
1. 扩散过程的最终结果 不是高斯噪声, 而是 Coarse Estimate Distribution. 2. 使用 GMM 重新建模 , 使得扩散的中间结果的分布带有方差信息. @2023.06.03 突然发现这篇文章中CVPR了 质量确实不错
type
Post
Field
Plat
💡
1. 扩散过程的最终结果 不是高斯噪声, 而是 Coarse Estimate Distribution. 2. 使用 GMM 重新建模 , 使得扩散的中间结果的分布带有方差信息.

Abstract

3D human pose estimation, which aims to predict the 3D coordinates of human joints from images or videos. The mainstream approach is to conduct 3D pose estimation in two stages: the 2D pose is first obtained with a 2D pose detector, and then 2D-to-3D lifting is performed.
Inspired by the strong capability of diffusion models to generate realistic samples even from a starting point with high uncertainty (e.g., random noise), here we aim to tackle 3D pose estimation with diffusion models.
DiffPose models the 3D pose estimation procedure as a reverse diffusion process, where we progressively transform a 3D pose distribution with high uncertainty and indeterminacy towards a 3D pose with low uncertainty.
notion image
  • Problem
    • We start the reverse diffusion process from an estimated 2D pose which has high uncertainty in 3D space, instead of starting from random noise.
  • Method
    • Firstly, we initialize the indeterminate 3D pose distribution based on extracted heatmaps and also the data distribution in the training set.
      Secondly, during forward diffusion, to generate the indeterminate 3D pose distributions that eventually (after steps) resemble , we add noise to the ground truth 3D pose distribution , where the noise is modeled by a Gaussian Mixture Model (GMM) that characterizes the uncertainty distribution .
      Thirdly, the reverse diffusion process is conditioned on context information.

Method

notion image

Initializing 3D Pose Distribution

To aid our diffusion model in handling the uncertainty and indeterminacy of each input 2D pose in 3D space, we would like to initialize a corresponding 3D pose distribution that captures the uncertainty of the 3D pose.
We use the corresponding heatmaps from the off-the-shelf 2D pose detector as the and distribution. We compute the distribution by calculating the occurrence frequencies of values in the training data.

Forward Pose Diffusion

For DiffPose, we do not want to diffuse our 3D pose towards a standard Gaussian noise.
💡
如果直接从 GT 的点扩散到采样到的初始化点, 那么最后扩散的结果是一个以该点为均值的高斯分布, 丢失了 分布的方差信息. 因此, 这里使用 GMM 建模 , 从而做到向 分布扩散.
We propose to use a Gaussian Mixture Model (GMM) to model the uncertainty distribution . Specifically, we set the number of Gaussian components in the GMM at , and use the Expectation-Maximization (EM) algorithm to optimize the GMM parameters to fit the target distribution as follows:
Next, we want to run the forward diffusion process on the ground truth pose distribution such that after steps, the generated noisy distribution becomes equivalent to the fitted GMM distribution , which we henceforth denote as because it is a GMM-based representation of .

Overall Training and Testing Process

  • Training
      1. initialize ;
      1. use and to generate supervisory signals via the forward process;
      1. run steps of the reverse process starting from and optimize with our generated signals.
  • Test
      1. initialize ;
      1. run steps of the reverse process starting from to obtain final prediction .
  • Loss
 

Experiments

notion image
notion image

© Lazurite 2021 - 2024