Life is about waiting for the right moment to act.

0%

CBAM Convolutional Block Attention Module笔记

给定一个中间的 feature map, 可以从channel 和 spatial 2 个维度推理 attention maps,然后,这些 attention maps 通过相乘以自适应地进行特征增强。

ECCV2018的一篇文章:

文章地址

代码PyTorch

Attention 不仅可以告诉我们关注的重点,还可以改善感兴趣的表示方式。

本文的目标就是通过 attention mechanism 提高 representation power: 专注于有用的信息,抑制无用的信息。文本不仅关注 cross-channel,还关注 spatial information, 因此,作者顺序地执行 channel 和spatial attention modules,每个 branch 都可以分别学习 ‘what’ and ‘where’ to attend in the channel and spatial axes.

We visualize trained models using the grad-CAM and observe that CBAM-enhanced networks focus on target objects more properly than their baseline networks. 因此,作者推测性能的提升来自准确的 attention和噪声的降低。作者还精心地设计地轻量化,在大部分情况下计算量和参数的负担都是可以接受的。

对于一个 feature map $\textbf F \in \R^{C\times H\times W}$, CBAM 顺序地 infers 一个1D 的 channel attention map $\textbf M_{c} \in \R^{C\times 1 \times 1}$ 和一个 2D 的 spatial attention map $\textbf M_{s} \in \R^{1\times H \times W}​$

Channel attention:SENet 类似,多并了一个max-pooling,作者说,We empirically confirmed that exploiting both features greatly improves representation power of networks rather than using each independently.

Channel attention: 对 channel 维度进行max-pooling 和 average-pooling,然后并起来,再经过一个卷积,得到 spatial attention map.

==作者尝试了并行和串行,还调整了顺序,实验发现先 channel 再 spatial 的串行模式比较好。==

Network Visualization with Grad-CAM

代码分析:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class ChannelAttention(nn.Module):
def __init__(self, in_planes, ratio=16):
super(ChannelAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.max_pool = nn.AdaptiveMaxPool2d(1)

self.fc1 = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
self.relu1 = nn.ReLU()
self.fc2 = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)

self.sigmoid = nn.Sigmoid()

def forward(self, x):
avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
out = avg_out + max_out
return self.sigmoid(out)

class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
super(SpatialAttention, self).__init__()

assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
padding = 3 if kernel_size == 7 else 1

self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
self.sigmoid = nn.Sigmoid()

def forward(self, x):
avg_out = torch.mean(x, dim=1, keepdim=True)
max_out, _ = torch.max(x, dim=1, keepdim=True)
x = torch.cat([avg_out, max_out], dim=1)
x = self.conv1(x)
return self.sigmoid(x)