DDPM and Segmentation
date
Nov 27, 2022
Last edited time
Mar 27, 2023 08:41 AM
status
Published
slug
DDPM_and_Segmentation
tags
DL
DDPM
summary
type
Post
Field
Plat
SegDiff: Image Segmentation with Diffusion Probabilistic ModelsImplement DetailExperimentsAblation StudyMedSegDiff: Medical Image Segmentation with Diffusion Probabilistic ModelMethodExperimentAblation StudyDiffusion Models for Implicit Image Segmentation EnsemblesExperimentsAblation Study
SegDiff: Image Segmentation with Diffusion Probabilistic Models
将噪声预测网络 建模为 , 其中解码器 是常规的,它的编码器被分成三个网络: , 和 。 的输出具有相同的空间维度和通道数。我们对 求和。然后,传递给U-Net编码器 。
Implement Detail
- 输入图像编码器 是由 Residual in Residual Dense Blocks (RRDBs) 构建的,它结合了多级残差连接,没有批处理归一化层。
- 是一个具有单通道输入和 通道输出的二维卷积层。
- 和 是标准的 U-Net
- 我们采用100个扩散步骤来减少推断时间。
- 对于不同的数据集, 使用不同生成的实例的数量以增加mIoU。
Experiments
Ablation Study
- Diffusion Step
- Number of generated Instances
- Number of RRDB blocks
- Variant of fusion
The first variant concatenates at the channel dimension.
The second variant employs
FCHarDNet-70 V2
instead of RRDBs
.
The third variant, concatenates channel-wise to , without using an encoder.
The last alternative method is to propagate through the U-Net module and add it to after the first, third, and fifth downsample blocks (variants four–six), instead of performing . Result:
The summation we introduce as a conditioning approach outperforms concatenation (variant one) on Vaihingen by a large margin, while on Cityscapes "Bus", the difference is small.
The RRDB blocks are preferable to the
FCHarDNet
architecture in both datasets (variant two).
Removing the encoder affects the metrics significantly (variant three), slightly more so on Vaihingen.
The change in the signal's integration position of variant four leads to a negligible difference on Vaihingen and even outperforms our full method on Cityscapes "Bus". Variants five and six lead to a decrease in performance as the distance from the first layer increases.MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model
我们注意到,在医学图像分割任务中,病变/器官往往是模糊的,很难与背景区分开来。在这种情况下,自适应校准过程是获得细微结果的关键。
针对自适应区域注意,我们在每一步将当前步骤的 Segment 集成到 Image Condition Encoder 中。具体实现是以多尺度的方式将当前步骤分割掩码与特征级别上的图像先验融合。这样,损坏的当前步骤掩码有助于动态增强条件特征,从而提高重建精度。为了消除此过程中损坏给定掩码中的高频噪声,我们进一步提出了特征频率解析器(FF-Parser)来过滤频域空间中的特征。
In order to achieve the segmentation, we condition the step estimation function by raw image prior, which can be represented as:
where is the conditional feature embedding, in our case, the raw image embedding, is the segmentation map feature embedding of the current step. The two components are added and sent to a UNet decoder D for the reconstruction.
Method
Dynamic Conditional Encoding
In the raw image encoder, we enhance its intermediate feature with the current-step encoding features. Each scale of the conditional feature map is fused with the encoding features with the same shape, is the index of layer. The fusion is implemented by an attentive-like mechanism .
where implies element-wise multiplication, denotes layer normalization.
FF-Parser
The function of FF-Parser is to constrain the noise-related components in the features. Our main idea is to learn a parameterized attentive (weight) map applying on the Fourier space features.
Different from the spacial attention, it globally adjusts the components of the specific frequencies. Thus it can be learn to constrain the high-frequency component for the adaptive integration.
Experiment
Ablation Study
Diffusion Models for Implicit Image Segmentation Ensembles
Let be the given brain MR image of dimension , where denotes the number of channels, and denote the image height and image width. The ground truth segmentation of the tumor for the input image is denoted as , and is of dimension . We train a DDPM for the generation of segmentation masks.
We induce the anatomical information present in by adding it as an image prior to . We do this by concatenating and , and define . Consequently, has dimension .
Experiments
Ablation Study
- Number of ensample
we implicitly generate an ensemble of segmentation masks without having to train a new model. This ensemble can then be used to boost the segmentation performance.