SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation 笔记

date

Oct 11, 2022

Last edited time

Mar 27, 2023 08:45 AM

status

Published

slug

SegNeXt笔记

tags

summary

type

Post

origin

https://www.notion.so/lazurite/SegNeXt-Rethinking-Convolutional-Attention-Design-for-Semantic-Segmentation-152be9ac12914e76a14e680216ebfaf2

Field

Plat

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation

We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of semantic segmentation due to the efficiency of self-attention in encoding spatial information. In this paper, we show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers.

https://arxiv.org/abs/2209.08575

2209.08575.pdf

521.6KB

深度学习论文: SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation及其PyTorch实现_mingo_敏的博客-CSDN博客

深度学习论文: SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation及其PyTorch实现 SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation PDF: https://arxiv.org/pdf/2209.08575.pdf PyTorch代码: https://github.com/shanglianlm0525/CvPytorch PyTorch代码: https://github.com/shanglianlm0525/PyTorch-Networks 作者分析了一些经典的语义分割网络，总结出经典的网络具备的关键属性：基于此，不同于已有Transformer方案，提出的SegNeXt对编码器模块采用传统卷积模块设计但引入了多尺度卷积注意力，对解码器模块采用了Hamberger(自注意力的一种替代方案)进一步提取全局上下文信息。提出的SegNeXt兼具性能和速度的优势；在Encoder部分同样采用了金字塔架构，每个构成模块采用了类似ViT的结构，但不同之处在于：本文并未使用自注意力，而是设计一种多尺度卷积注意力模块MSCA. MSCAN是VAN的多尺度版本。如上图所示，MSCA由三部分构成： depth-wise 卷积：用于聚合局部信息多分支depth-wise卷积：用于捕获多尺度上下文信息 1x1卷积：用于在通道维度进行相关性建模通过堆叠MSCA而得到的不同MSCAN骨干信息,如下解码器结构 a，源自SegFormer的解码器，它是一种纯MLP架构； b，常被CNN方案使用，如ASPP、PSP、DANet等； c，综合a和b, 采用轻量型Hamberger模块对后三个阶段的特性进行聚合以进行全局上下文建模。 Hamburger: Hamburger通过去噪和完善其输入来学习可解释的全局上下文，并重新调整光谱的浓度。当仔细处理通过 MDs 回传的梯度时，具有不同 MDs 的 Hamburgers 可以对流行的全局上下文模块 self-attention 有良好的表现。

https://blog.csdn.net/shanglianlm/article/details/127123224

GitHub - Visual-Attention-Network/SegNeXt: Official Pytorch implementations for "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation" (NeurIPS 2022)

The repository contains official Pytorch implementations of training and evaluation codes and pre-trained models for SegNext. For Jittor user, https://github.com/Jittor/JSeg is a jittor version. The paper is in Here. The code is based on MMSegmentaion v0.24.1.

https://github.com/visual-attention-network/segnext

1 概述

作者分析了一些经典的语义分割网络，总结出经典的网络具备的关键属性：

1 采用强骨干网络作为编码器；

2 多尺度信息交互；

3 空域注意力；

4 低计算复杂度。

基于此，不同于已有

Transformer

方案，提出的 SegNeXt 对编码器模块采用传统卷积模块设计但引入了多尺度卷积注意力，对解码器模块采用了 Hamberger(自注意力的一种替代方案) 进一步提取全局上下文信息。提出的 SegNeXt 兼具性能和速度的优势；

2 SegNeXt

2-1 Convolutional Encoder

在 Encoder 部分同样采用了金字塔架构，每个构成模块采用了类似 ViT 的结构，但不同之处在于：本文并未使用自注意力，而是设计一种多尺度卷积注意力模块 MSCA.

MSCAN 是 VAN 的多尺度版本

。

如上图所示，MSCA 由三部分构成：

depth-wise 卷积：用于聚合局部信息

多分支 depth-wise 卷积：用于捕获多尺度上下文信息

1x1 卷积：用于在通道维度进行相关性建模

通过堆叠 MSCA 而得到的不同 MSCAN 骨干信息, 如下

2-2 Decoder

解码

器结构

a，源自 SegFormer 的解码器，它是一种纯 MLP 架构；

b，常被 CNN 方案使用，如 ASPP、PSP、DANet 等；

c，综合 a 和 b, 采用轻量型 Hamberger 模块对后三个阶段的特性进行聚合以进行全局上下文建模。

Hamburger:

Hamburger 通过去噪和完善其输入来学习可解释的全局上下文，并重新调整光谱的浓度。当仔细处理通过 MDs 回传的梯度时，具有不同 MDs 的 Hamburgers 可以对流行的全局上下文模块 self-attention 有良好的表现。

SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation 笔记

1 概述

2 SegNeXt

2-1 Convolutional Encoder

2-2 Decoder

3 Experiments