DDPM and Segmentation

date

Nov 27, 2022

Last edited time

Mar 27, 2023 08:41 AM

status

Published

slug

DDPM_and_Segmentation

SegDiff: Image Segmentation with Diffusion Probabilistic Models

Diffusion Probabilistic Methods are employed for state-of-the-art image generation. In this work, we present a method for extending such models for performing image segmentation. The method learns end-to-end, without relying on a pre-trained backbone. The information in the input image and in the current estimation of the segmentation map is merged by summing the output of two encoders.

https://arxiv.org/abs/2112.00390

2112.00390.pdf

5082.9KB

GitHub - tomeramit/SegDiff

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or window. Reload to refresh your session. Reload to refresh your session.

https://github.com/tomeramit/SegDiff

将噪声预测网络建模为 , 其中解码器是常规的，它的编码器被分成三个网络: , 和。的输出具有相同的空间维度和通道数。我们对求和。然后，传递给U-Net编码器。

Implement Detail

输入图像编码器是由 Residual in Residual Dense Blocks (RRDBs) 构建的，它结合了多级残差连接，没有批处理归一化层。

是一个具有单通道输入和通道输出的二维卷积层。

和是标准的 U-Net

我们采用100个扩散步骤来减少推断时间。

对于不同的数据集, 使用不同生成的实例的数量以增加mIoU。

Experiments

Ablation Study

Diffusion Step

Number of generated Instances

Number of RRDB blocks

Variant of fusion

💡

The first variant concatenates at the channel dimension. The second variant employs FCHarDNet-70 V2 instead of RRDBs. The third variant, concatenates channel-wise to , without using an encoder. The last alternative method is to propagate through the U-Net module and add it to after the first, third, and fifth downsample blocks (variants four–six), instead of performing .

Result:

The summation we introduce as a conditioning approach outperforms concatenation (variant one) on Vaihingen by a large margin, while on Cityscapes "Bus", the difference is small. The RRDB blocks are preferable to the FCHarDNet architecture in both datasets (variant two). Removing the encoder affects the metrics significantly (variant three), slightly more so on Vaihingen. The change in the signal's integration position of variant four leads to a negligible difference on Vaihingen and even outperforms our full method on Cityscapes "Bus". Variants five and six lead to a decrease in performance as the distance from the first layer increases.

MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model

Diffusion probabilistic model (DPM) recently becomes one of the hottest topic in computer vision. Its image generation application such as Imagen, Latent Diffusion Models and Stable Diffusion have shown impressive generation capabilities, which aroused extensive discussion in the community. Many recent studies also found it useful in many other vision tasks, like image deblurring, super-resolution and anomaly detection.

https://arxiv.org/abs/2211.00611

2211.00611.pdf

1715.2KB

GitHub - lucidrains/med-seg-diff-pytorch: Implementation of MedSegDiff in Pytorch - SOTA medical segmentation using DDPM and filtering of features in fourier space

Implementation of MedSegDiff in Pytorch - SOTA medical segmentation using DDPM and filtering of features in fourier space - GitHub - lucidrains/med-seg-diff-pytorch: Implementation of MedSegDiff in Pytorch - SOTA medical segmentation using DDPM and filtering of features in fourier space

https://github.com/lucidrains/med-seg-diff-pytorch

我们注意到，在医学图像分割任务中，病变/器官往往是模糊的，很难与背景区分开来。在这种情况下，自适应校准过程是获得细微结果的关键。

针对自适应区域注意，我们在每一步将当前步骤的 Segment 集成到 Image Condition Encoder 中。具体实现是以多尺度的方式将当前步骤分割掩码与特征级别上的图像先验融合。这样，损坏的当前步骤掩码有助于动态增强条件特征，从而提高重建精度。为了消除此过程中损坏给定掩码中的高频噪声，我们进一步提出了特征频率解析器（FF-Parser）来过滤频域空间中的特征。

💡

In order to achieve the segmentation, we condition the step estimation function by raw image prior, which can be represented as:

where is the conditional feature embedding, in our case, the raw image embedding, is the segmentation map feature embedding of the current step. The two components are added and sent to a UNet decoder D for the reconstruction.

Method

Dynamic Conditional Encoding

💡

In the raw image encoder, we enhance its intermediate feature with the current-step encoding features. Each scale of the conditional feature map is fused with the encoding features with the same shape, is the index of layer. The fusion is implemented by an attentive-like mechanism .

where implies element-wise multiplication, denotes layer normalization.

FF-Parser

💡

The function of FF-Parser is to constrain the noise-related components in the features. Our main idea is to learn a parameterized attentive (weight) map applying on the Fourier space features.

Different from the spacial attention, it globally adjusts the components of the specific frequencies. Thus it can be learn to constrain the high-frequency component for the adaptive integration.

Experiment

Ablation Study

Diffusion Models for Implicit Image Segmentation Ensembles

Diffusion models have shown impressive performance for generative modelling of images. In this paper, we present a novel semantic segmentation method based on diffusion models. By modifying the training and sampling scheme, we show that diffusion models can perform lesion segmentation of medical images.

https://arxiv.org/abs/2112.03145

2112.03145.pdf

1624.3KB

GitHub - JuliaWolleb/Diffusion-based-Segmentation: This is the official Pytorch implementation of the paper "Diffusion Models for Implicit Image Segmentation Ensembles".

We provide the official Pytorch implementation of the paper Diffusion Models for Implicit Image Segmentation Ensembles by Julia Wolleb, Robin Sandkühler, Florentin Bieder, Philippe Valmaggia, and Philippe C. Cattin. The implementation of Denoising Diffusion Probabilistic Models presented in the paper is based on openai/improved-diffusion. Diffusion models have shown impressive performance for generative modelling of images.

https://github.com/JuliaWolleb/Diffusion-based-Segmentation

💡

Let be the given brain MR image of dimension , where denotes the number of channels, and denote the image height and image width. The ground truth segmentation of the tumor for the input image is denoted as , and is of dimension . We train a DDPM for the generation of segmentation masks. We induce the anatomical information present in by adding it as an image prior to . We do this by concatenating and , and define . Consequently, has dimension .

Experiments

Ablation Study

Number of ensample

💡

we implicitly generate an ensemble of segmentation masks without having to train a new model. This ensemble can then be used to boost the segmentation performance.