CBAM Convolutional Block Attention Module笔记

给定一个中间的 feature map, 可以从channel 和 spatial 2 个维度推理 attention maps，然后，这些 attention maps 通过相乘以自适应地进行特征增强。

ECCV2018的一篇文章：

Attention 不仅可以告诉我们关注的重点，还可以改善感兴趣的表示方式。

本文的目标就是通过 attention mechanism 提高 representation power: 专注于有用的信息，抑制无用的信息。文本不仅关注 cross-channel，还关注 spatial information, 因此，作者顺序地执行 channel 和spatial attention modules，每个 branch 都可以分别学习 ‘what’ and ‘where’ to attend in the channel and spatial axes.

We visualize trained models using the grad-CAM and observe that CBAM-enhanced networks focus on target objects more properly than their baseline networks. 因此，作者推测性能的提升来自准确的 attention和噪声的降低。作者还精心地设计地轻量化，在大部分情况下计算量和参数的负担都是可以接受的。

对于一个 feature map $\textbf F \in \R^{C\times H\times W}$, CBAM 顺序地 infers 一个1D 的 channel attention map $\textbf M_{c} \in \R^{C\times 1 \times 1}$ 和一个 2D 的 spatial attention map $\textbf M_{s} \in \R^{1\times H \times W}$

Channel attention: 与 SENet 类似，多并了一个max-pooling，作者说，We empirically confirmed that exploiting both features greatly improves representation power of networks rather than using each independently.

Channel attention: 对 channel 维度进行max-pooling 和 average-pooling，然后并起来，再经过一个卷积，得到 spatial attention map.

==作者尝试了并行和串行，还调整了顺序，实验发现先 channel 再 spatial 的串行模式比较好。==

Network Visualization with Grad-CAM

代码分析：

class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc1   = nn.Conv2d(in_planes, in_planes // 16, 1, bias=False)
        self.relu1 = nn.ReLU()
        self.fc2   = nn.Conv2d(in_planes // 16, in_planes, 1, bias=False)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
        padding = 3 if kernel_size == 7 else 1

        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)