Weakly-Supervised Camouflaged Object Detection with Scribble Annotations 论文阅读

date
Dec 17, 2022
Last edited time
May 31, 2023 10:02 AM
status
Published
slug
Weakly-Supervised_Camouflaged_Object_Detection_with_Scribble_Annotations论文阅读
tags
DL
CV
summary
@2023.05.31 补充一点点
type
Post
Field
Plat

Abstract

  • Problem
    • Existing camouflaged object detection (COD) methods rely heavily on large-scale datasets with pixel-wise annotations. However, due to the ambiguous boundary, annotating camouflage objects pixel-wisely is very time-consuming and laborintensive, taking ∼60mins to label one image.
  • weakly-supervised COD
    • notion image
      we propose the first weakly-supervised COD method, using scribble annotations as supervision.
      we propose a novel consistency loss composed of two parts: a cross-view loss to attain reliable consistency over different images, and an inside-view loss to maintain consistency inside a single prediction map.
      we further propose a feature-guided loss, which includes visual features directly extracted from images and semantically significant features captured by the model.

Method

Overview

notion image

Feature-guided Loss

we design feature-guided loss based on both simple visual features (context affinity loss) and complex semantic features (semantic significance loss).
Context Affinity Loss
Nearby pixels with similar features tend to have the same class. We adopt the kernel method to measure the visual feature similarity (colors and positions).
where , are the position and colors of pixel . are hyperparameters. calculates the probability of pixel having different classes ( is the probability of positive labels for pixel )
💡
context affinity loss 鼓励视觉上不同的像素具有不同的标签,反之亦然: 就是使用像素相关性来监督类别的相似性. 这一步引入了手工设计的先验信息, 而且限制了模型向更高精度的发展.
where is a neighbor regions ( is set to 5 in our experiments) of center pixel . Through context affinity loss, the model can quickly learn from the unlabeled pixels.
Semantic Significance Loss
The semantic significance loss has a similar formulation to context affinity loss:
where are valid boundary regions (confidently classified pixels), and is set to increase with the epoch number (exponential ramp-up to 0.15 in practice) since the model has not learned well-represented features at the beginning.
In conclusion, the feature loss can be written as the sum of both loss in .
💡
类似于 (AAAI2021-SCWSSOD)Structure-Consistent Weakly Supervised Salient Object Detection with Local Saliency Coherence 的 Local Saliency Coherence Loss. 就是把 换成
notion image
变得更平滑了点罢了
notion image

Consistency Loss

we propose the cross-view (CV) consistency loss to alleviate the problem by minimizing the difference between the predictions of the input and its transform.
Cross-View Consistency Loss
are prediction maps of the input and its transform. is the total number of pixels and is a pixel index.
💡
We aim for the predictions of the transform to be pushed more than that of the normal input . The key here is to weight their backward gradient differently, and the proposed crossview consistency loss can be written as: , 即 detach. If , it is the original loss ; if , the backward gradient that pushes to is greater than the other way around, and thus the goal is reached. In practice, is set to 0.3.
 
 
Inside-view Consistency Loss
When the entropy is above a certain threshold, the prediction result is not sure and it is malicious to increase the certainty of the model in this case.
💡
即是当 Entropy 足够小的时候(判定为非噪声像素的时候), 惩罚像素的不确定性(熵)

Objective Function

Below is PCE loss, where is the set of labeled pixels in the scribble map, is the true class of pixel , and are the predictions on pixel :
💡
Here is supervised loss.

Experiment

notion image
notion image

© Lazurite 2021 - 2024