卷积神经网络学习笔记-SENet

完整代码及其数据,请移步小编的GitHub地址

传送门:请点击我

如果点击有误:https://github.com/LeBron-Jian/DeepLearningNote

在这里,结合网络和SENET论文的信息,看看SENET,基本的代码和图片都是来自网络的,谢谢你在这里,参考链接在后面。我们开始吧。[en]Here, combined with the information of the network and the SENet paper, take a look at the SENet, the basic code and pictures are from the network, thank you here, the reference links are later. Here we go.

SENet论文写的很好,有想法的可以去看一下,我这里提供翻译地址:

在深度学领域,CNN分类网络的发展对其他计算机视觉任务如目标检测和语义分割都起到至关重要的作用(检测和分割模型通常都是构建在 CNN 分类网络之上)。提到CNN分类网络,我们之前已经学了 AlexNet,VGGNet,InceptionNet,ResNet,DenseNet等,他们的效果已经被充分验证,而且被广泛的应用在各类计算机视觉任务上。这里我们再学一个网络(SENet),SENet 以极大的优势获得了最后一届 ImageNet 2017 竞赛 Image Classification 任务的冠军,和ResNet的出现类似,都很大程度上减少了之前模型的错误率,并且复杂度低,新增参数和计算量小。下面就来具体学一下SENet。

1,SENet 简介

SENet的全称是Squeeze-and-Excitation Networks,中文可以翻译为压缩和激励网络。 Squeeze-and-Excitation(SE) block 并不是一个完整的网络结构,而是一个子结构,可以嵌到其他分类或检测模型中,作者采用 SENet block 和 ResNeXt结合在 ILSVRC 2017 的分类项目中拿到第一,在ImageNet数据集上将 top-5 error 降低到 2.251%,原先的最好成绩是 2.991%。

在本文中,作者将SENET块嵌入到现有的各种分类网络中,取得了较好的效果。SENET的核心思想是通过网络根据损失学习特征权重,使有效的特征图权重、无效或较小的特征图权重训练模型都能达到更好的效果。当然,在一些原有的分类网络中嵌入SE块不可避免地增加了一些参数和计算量,但在效果面前是可以接受的。[en]In this paper, the author inserts SENet block into a variety of existing classification networks and achieves good results. The core idea of SENet is to learn the feature weight according to loss through the network, so that the effective feature map weight, invalid or small feature map weight training model can achieve better results. Of course, SE block embedding in some of the original classification networks inevitably increases some parameters and computation, but it is acceptable in front of the effect.

也许很多人都有重视某一层功能的想法,那么为什么只有Senet成功呢?就个人而言,我认为主要原因在于如何训练重量。就像有些人直接根据特征图的数值分布来判断一样,有些人也可能用减重来指导体重训练,但如何获取和利用全局信息也是因人而异的。[en]Maybe a lot of people have the idea of giving weight to a certain layer of features, so why is only SENet successful? Personally, I think the main reason lies in how to train the weight. Just like some are judged directly based on the numerical distribution of feature map, others may also use loss to guide weight training, but how to obtain and use global information is also due to people.

2,SENet的主体思路

2.1 中心思想

对于CNN网络,其核心计算是卷积算子,通过卷积核将输入的特征图转换为新的特征图。卷积本质上是局部区域的特征融合,包括空间内(W维和H维)和通道间(C维)的特征融合。对于卷积运算,很大一部分工作是改进接受场,即在空间上融合更多的特征,或者提取多尺度的空间信息,而SENET网络的创新之处在于注意了通道之间的关系。希望该模型能够自动学习到不同渠道特征的重要性。为此,Senet提出了压缩激发(SE)模块。[en]For CNN network, its core calculation is convolution operator, which changes from input feature graph to new feature graph through convolution kernel. In essence, convolution is the feature fusion of a local region, which includes feature fusion in space (W and H dimensions) and between channels (dimension C). For convolution operations, a large part of the work is to improve the receptive field, that is, to spatially fuse more features, or to extract multi-scale spatial information, while the innovation of * SENet network is to pay attention to the relationship between channel. It is hoped that the model can automatically learn * to the importance of different channel features. For this reason, SENet proposed the Squeeze-and-Excitation (SE) module.

中心思想:对于每个输出 channel,预测一个常数权重,对每个 channel 加权一下,本质上,SE模块是在 channel 维度上做 attention 或者 gating 操作,这种注意力机制让模型可以更加关注信息量最大的 channel 特征,而抑制那些不重要的 channel 特征。SENet 一个很大的优点就是可以很方便地集成到现有网络中,提升网络性能,并且代价很小。

SENET的基本结构如下:[en]The basic structure of SENet is as follows:

卷积神经网络学习笔记-SENet

在原来的任意变换中,将输入X改为输出U,每个通道的重要程度不同,有的通道更有用,有的通道不那么有用。[en]In the original arbitrary transformation, the input X is changed into the output U, and each channel has a different degree of importance, some channels are more useful and some channels are less useful.

对于每个输出通道,首先全局平均池,每个通道得到一个标量,C个通道得到C个数,然后通过FC-RELU-FC-Sigmoid得到C个零之间的标量,作为通道的权值。然后对原始输出通道的每个通道进行相应的加权(相应通道的每个元素分别乘以该权重),得到一个新的加权特征,作者称之为特征重校准。[en]For each output channel, first global average pool, each channel gets a scalar, C channels get the number of C, and then the scalars between C zero ones are obtained by FC-ReLU-FC-Sigmoid, which is used as the weight of the channel. Then each channel of the original output channel is weighted by the corresponding weight (each element of the corresponding channel is multiplied by the weight respectively), and a new weighted feature is obtained, which is called feature recalibration by the author.

在第一步中,将每个通道的全局平均哈希数合并得到标量,称为挤压,然后两个FC获得0到1之间的权值,并将每个FC的每个元素乘以相应通道的权重,得到新的特征映射,称为激励。任何原始网络结构都可以通过这种挤压激励,使用修改后的网络,即SENET版本来重新校准特征。[en]In the first step, the global average number of hashes of each channel is pooled to get a scalar, called Squeeze, and then two FC get a weight value between 0 and 1, and multiply each element of each FC by the weight of the corresponding channel to get a new feature map, called Excitation. Any original network structure can be feature recalibration through this Squeeze-Excitation, using a modified network, that is, the SENet version.

上述模块具有很强的通用性,可以很容易地与现有网络集成,得到相应的SENET版本,提高现有网络的性能。SENET一般是指采用上述结构的所有网络。此外,SENET还可以具体参考作者ILSVRC 2017夺冠所用的SE-ResNeXt-152(第644天)。[en]The above module is very general, and it can be easily integrated with the existing network to get the corresponding SENet version to improve the performance of the existing network. SENet generally refers to all the networks that adopt the above structure. In addition, SENet can also specifically refer to the SE-ResNeXt-152 (644th day) used by the author ILSVRC 2017 to win the championship.

SENet和ResNet很相似,但比ResNet做的更多,ResNet只是增加了一个 skip connection,而SENet在相邻两层之间加入了处理,使得 channel 之间的信息交互称为可能,进一步提高了网络的准确率。

卷积神经网络学习笔记-SENet

让我们从最基本的卷积运算开始。多年来,卷积神经网络在许多领域都取得了很大的突破。卷积核作为卷积神经网络的核心,通常被认为是局部感受野上空间信息(空间)和特征维(通道方向)信息的集合。卷积神经网络由一系列卷积层、非线性层和低采样层组成,能够从全局感受野中捕捉图像的特征来描述图像。[en]Let’s start with the most basic convolution operations. Over the years, convolution neural networks have made great breakthroughs in many fields. As the core of convolution neural network, convolution kernel is usually regarded as an information aggregate that aggregates information in space (Spatial) and information in feature dimension (channel-wise) on the local receptive field. The convolution neural network consists of a series of convolution layers, nonlinear layers and lower sampling layers, so that they can capture the features of the image from the global receptive field to describe the image.

卷积神经网络学习笔记-SENet

然而,学习一个性能非常强的网络是非常困难的,困难来自多个方面。为了从空间维度上提高网络的性能,已经做了大量的工作,如在先启结构中嵌入多尺度信息,在不同的接受域上聚合各种特征以实现性能增益,在内外网络中考虑空间上下文信息,以及在空间维度中引入注意机制等。所有这些工作都取得了较好的效果。[en]However, it is very difficult to learn a network with very strong performance, and the difficulty comes from many aspects. Most * a lot of work has been proposed to improve the performance of the network from the spatial dimension, such as embedding multi-scale information in the Inception structure, aggregating a variety of features on different receptive fields to achieve performance gain; considering the spatial context information in the Inside-Outside network; and introducing the Attention mechanism into the spatial dimension, and so on. All this work has achieved quite good results.

卷积神经网络学习笔记-SENet

我们可以看到,在空间维度上已经做了大量的工作来提高网络的性能。因此,自然会考虑网络是否可以在其他层面上提高性能,例如特征频道之间的关系。我们的工作正是基于这一点,提出了挤压激励网络(简称SENET)。在我们提出的结构中,压缩和激发是两个非常关键的操作,因此我们将它们命名为。我们的动机是明确地对功能通道之间的相互依赖关系进行建模。此外,我们不打算为特征通道融合引入新的空间维度,而是采用一种新的“特征重校准”策略。具体地说,就是通过学习自动获取每个特征通道的重要性,然后根据这个重要性来提升有用的特征和抑制对当前任务不有用的特征。[en]We can see that there has been a lot of work on the spatial dimension to improve the performance of the network. So it’s natural to think about whether the network can improve performance at other levels, such as the relationship between characteristic channels. Our work is based on this point and put forward Squeeze-and-Excitation Networks (SENet for short). In our proposed structure, Squeeze and Excitation are two very critical operations, so we name them. Our motivation is to explicitly model the interdependencies between feature channels. In addition, we do not intend to introduce a new spatial dimension for feature channel fusion, but adopt a new “feature recalibration” strategy. Specifically, it is to automatically obtain the importance of each feature channel by learning , and then according to this importance to promote useful features and suppress features that are not useful for the current task.

卷积神经网络学习笔记-SENet

上图是我们建议的SE模块的示意图。在给定输入x的情况下,特征通道的数目为c1,并且通过卷积等一系列一般变换来获得具有特征通道数目c2的特征。与传统的CNN不同,我们使用三种操作来重新校准先前获得的特征。[en]The figure above is a schematic diagram of our proposed SE module. Given an input x, the number of characteristic channels is C1, and a feature with the number of characteristic channels c2 is obtained through a series of general transformations such as convolution. Unlike the traditional CNN, we use three operations to re-calibrate the previously obtained features.

2.2 SE模块

SE模块主要包含 Squeeze 和 Excitation 两个操作,可以适用于任何映射:

卷积神经网络学习笔记-SENet

卷积神经网络学习笔记-SENet

2.3 Squeeze操作

第一个是挤压操作。我们沿着空间维度对要素进行压缩。原始要素地图的维度为HWC,其中H是高度(Height),W是宽度(Width),C是通道数(Channel)。挤压所做的是将HWC压缩到1厘米1C,这相当于将每个二维特征通道(即隐藏W)变成实数(即变成一维)。在实践中,它通常在全球平均池中实施。将HendW压缩为一维后,该维度获得了之前HlowW的全局视野,并且感受野区域更宽,所以这个实数在一定程度上具有全局接受野,并且输出维度与输入的特征通道数匹配。它表征了特征通道上响应的全球分布,并使依赖输入的层能够获得全局接受场,这在许多任务中非常有用。[en]The first is * Squeeze operation * . We compress the features along the spatial dimension. The dimension of the original feature map is HWC, where H is the height (height), W is the width (Width), and C is the number of channels (Channel). What Squeeze does is to compress HW*C to 1cm 1C, which is equivalent to turning each two-dimensional characteristic channel (i.e. Hidden W) into a real number (that is, into one dimension). In practice, it is generally implemented in global average pooling. After HendW is compressed into one dimension, this dimension obtains the global vision of HallowW before, and the receptive field area is wider, so this real number has a global receptive field to some extent, and the output dimension matches the number of characteristic channels of the input. It characterizes the global distribution of responses on the feature channel, and enables layers that rely on * input to obtain a global receptive field, which is very useful in many tasks.

由于卷积仅在局部空间中运算,U很难获得足够的信息来提取信道之间的关系,这对于网络的前端层来说更为严重,因为接受野相对较小。为此,Senet提出了挤压操作,将通道上的整个空间特征编码为全局特征,通过全局平均池实现(原则上也可以采用更复杂的聚合策略):[en]Because convolution only operates in a local space, it is difficult for U to obtain enough information to extract the relationship between channel, which is more serious for the front layer of the network, because the receptive field is relatively small. For this reason, SENet puts forward the Squeeze operation, which encodes the whole spatial feature on a channel into a global feature, which is implemented by global average pooling (in principle, a more complex aggregation strategy can also be adopted):

卷积神经网络学习笔记-SENet

2.4 Excitation 操作

第二种是激励操作,类似于循环神经网络中的门机制。每个特征通道的权重由参数w产生,其中参数w被学习以显式地对特征通道之间的相关性建模。在获得挤压的表示后,添加一个FC全连接层(全连通),预测每个通道的重要性,得到不同通道的重要性,然后作用于前一个特征映射的对应通道,然后进行后续操作。[en]The second is the * Excitation operation * *, which is similar to the gate mechanism in cyclic neural networks. The weight of each feature channel is generated by the parameter w, where the parameter w is learned * to explicitly model the correlation between the feature channels. After getting the representation of Squeeze, add a FC full connection layer (Fully Connected), predict the importance of each channel, get the importance of different channel, and then act on the corresponding channel of the previous feature map, and then carry on the follow-up operation.

Sequeeze操作得到了全局描述特征,我们接下来需要另外一种运算来抓取 channel 之间的关系。这个操作需要满足两个准则:首先要灵活,它要可以学到各个 channel之间的非线性关系;第二点是学的关系不是互斥的,因为这里允许多 channel 特征,而不是 one-hot 形式。基于此,这里采用了 Sigmoid形式的 gating 机制:

卷积神经网络学习笔记-SENet

在哪里:[en]Where:

卷积神经网络学习笔记-SENet

为了降低模型的复杂度,提高泛化能力,采用了两层全连通的瓶颈结构,其中第一层FC层起降维作用,降维系数r是一个超参数,然后用RELU对其进行激活。最终的FC层恢复原始尺寸。[en]In order to reduce the complexity of the model and improve the generalization ability, the bottleneck structure with two fully connected layers is adopted, in which the first FC layer plays a role in dimensionality reduction, and the dimensionality reduction coefficient r is a super parameter, and then ReLU is used to activate it. The final FC layer restores the original dimension.

最后,将学习到的每个通道的激活值(Sigmoid激活值,值0,1)乘以U:[en]Finally, multiply the activation value (Sigmoid activation, value 0,1) of each channel learned * by the original feature on U:

卷积神经网络学习笔记-SENet

整个操作可以看作是学习每个通道的权重系数,这使得模型更有能力识别每个通道的特征,这也应该被视为一种注意机制。[en]The whole operation can be regarded as learning the weight coefficient of each channel, which makes the model more capable of identifying the characteristics of each channel, which should also be regarded as an attention mechanism.

最后一个是权重调整的操作。我们将激励输出的权重作为特征选择后各特征通道的重要性,然后将其与前一个特征通道进行乘法加权,完成对原始特征在通道维度上的重新校准。[en]The last one is the operation of Reweight. We regard the weight of the output of Excitation as the importance of each feature channel after feature selection, and then weigh it to the previous feature channel by multiplication to complete the recalibration of the original feature in the channel dimension.

3,SE模块的应用

3.1 SE模块在 Inception 和 ResNet 上的应用

SE模块的灵活性在于它可以直接应用现有的网络结构中。这里以 Inception和ResNet为例。对于 Inception网络,没有残差网络,这里对整个Inception模块应用SE模块。对于ResNet,SE模块嵌入到残差结构中的残差学*分支中,具体如下图所示:

卷积神经网络学习笔记-SENet

左上角是将SE模块嵌入到先启结构中的示例。框旁边的尺寸信息表示该层的输出。[en]The top left is an example of embedding a SE module into an Inception structure. The dimension information next to the box represents the output of that layer.

在这里,我们使用全局平均池作为挤压操作。然后,两个完全连通的层形成一个瓶颈结构来建模通道之间的相关性,并输出与输入特征相同数量的权重。我们首先将特征尺寸降低到输入1/16,然后在被RELU激活后通过完全连通的层上升到原始尺寸。与直接使用全连接层相比,该方法具有以下优点:1)具有更强的非线性,能更好地适应通道间的复杂相关性;2)大大减少了参数数量和计算量。然后通过Sigmoid门得到0到1之间的归一化权值,最后通过尺度运算将归一化权值加权到每个通道的特异值。[en]Here we use global average pooling as the Squeeze operation. Then the two Fully Connected layers form a Bottleneck structure to model the correlation between channels and output the same number of weights as input features. We first reduce the feature dimension to the input 1ax 16, and then rise back to the original dimension through a Fully Connected layer after being activated by ReLU. Compared with using a Fully Connected layer directly, this method has the following advantages: 1) it is more nonlinear and can better fit the complex correlation between channels; 2) it greatly reduces the number of parameters and the amount of computation. Then the normalized weight between 0 and 1 is obtained through a Sigmoid gate, and finally the normalized weight is weighted to the special of each channel through a Scale operation.

此外,SE模块还可以嵌入包含跳过连接的模块中。右上角的图像是将SE嵌入ResNet模块的示例。运算过程与SE-INCEPTION基本相同,只是在相加之前对分支上的残差特征进行了重新校准。如果重新校准加法主支上的特征,由于主干中存在规模运算,当BP优化算法在网络中越深入时,容易在输入层依赖*消散梯度,从而使模型难以优化。[en]In addition, SE modules can also be embedded in modules that contain skip-connections. The image on the upper right is an example of embedding SE into the ResNet module. The operation process is basically the same as SE-Inception, except that the feature of Residual on the branch is re-calibrated before Addition. If we re-calibrate the features on the main branch of the Addition, due to the existence of scale operation in the backbone, it is easy to dissipate the gradient in the input layer depending on the * when the BP optimization is deeper in the network, which makes the model difficult to optimize.

目前,主流的电视网大多是基于这两个相似单元的叠加重复。由此可见,SE模块可以嵌入到目前几乎所有的网络结构中。通过将SE模块嵌入到原有网络结构的积木单元中,可以得到不同类型的SENET。例如SE-BN-初始、SE-ResNet、SE-ReNeXt、SE-Inception-ResNet-v2等。[en]At present, most of the mainstream networks are based on the superposition of these two similar units by repeat. Thus it can be seen that the SE module can be embedded in almost all the current network structures. By embedding the SE module in the building block unit of the original network structure, we can obtain different kinds of SENet. Such as SE-BN-Inception,SE-ResNet,SE-ReNeXt,SE-Inception-ResNet-v2 and so on.

卷积神经网络学习笔记-SENet

卷积神经网络学习笔记-SENet

从上面的介绍中可以看到,Senet构造非常简单且易于部署,不需要引入新的函数或层。此外,它在模型和计算复杂度上也具有良好的特性。例如,比较ResNet-50和SE-ResNet-50,SE-ResNet-50的模型参数相对于ResNet-50增加了10%。附加的模型参数存在于由瓶颈设计的两个完全连通的模型中。由于ResNet结构中最后一级的特征通道数为2048个,因此模型参数有较大的增加。研究发现,在最后阶段删除三个构建块上的SE设置可以将10%的参数增长减少到2%。此时,模型的精度几乎没有损失。[en]As you can see from the above introduction, the SENet construction is very simple and easy to deploy, without the need to introduce new functions or layers. In addition, it also has good characteristics in model and computational complexity. Comparing ResNet-50 with SE-ResNet-50, for example, SE-ResNet-50 has a 10% increase in model parameters relative to ResNet-50. The additional model parameters exist in the two Fully Connected designed by Bottleneck. Because the number of characteristic channels of the last stage in the ResNet structure is 2048, the model parameters have a large increase. It is found that removing the SE settings on the three build block in the last stage can reduce the growth of 10% parameters to 2%. At this time, the accuracy of the model is almost without loss.

此外,由于全局池和完全连接且计算量较少的GPU实现没有进行优化,这导致在GPU上SE-ResNet-50的运行时间比ResNet-50增加了约10%。尽管如此,理论上它在额外计算上的增长不到1%,这与它在CPU运行时间上的增长(~2%)相当。可以看出,在现有的网络架构中嵌入SE模块,几乎不会增加额外的参数和计算量。[en]In addition, because global pooling and Fully Connected with less computation are not optimized in existing GPU implementations, this results in an increase of about 10% in elapsed time SE-ResNet-50 over ResNet-50 on GPU. Nevertheless, its theoretical increase in additional computation is less than 1 per cent, which matches its increase in CPU elapsed time (~ 2 per cent). It can be seen that embedding the SE module in the existing network architecture results in little increase in additional parameters and computation.

随着SE模块的加入,模型参数和计算量都会增加。以SE-ResNet-50为例,模型参数增加如下:[en]With the addition of the SE module, the model parameters and the amount of calculation will increase. Take SE-ResNet-50 as an example, the increase in the model parameters is as follows:

卷积神经网络学习笔记-SENet

其中 r 为降维系数,S表示 stage数量,Cs 为第 s 个 stage的通道数,Ns 为第 s 个 stage的重复 block量。当 r=16时,SE-ResNet-50只增加了约 10%的参数量,但是计算量(GFLOPS)却增加不到 1%。

3.2 SE模块在ResNet网络上的模型效果

SE模块很容易嵌入到其他网络中,作者为了验证 SE模块的作用,在其他流行网络如 ResNet和VGG中引入 SE模块,测试其在 ImageNet 上的效果。

卷积神经网络学习笔记-SENet

在训练中,我们使用了一些常见的数据增强方法和李申提出的平衡数据策略。为了提高训练效率,我们使用了自己优化的分布式训练系统ROCS,并采用了更大的批量和初始化率。所有的模特都是从头开始训练的。[en]In the training, we use some common data enhancement methods and the balanced data strategy proposed by Li Shen. In order to improve the training efficiency, we use our own optimized distributed training system ROCS, and adopt larger batch-size and initialization rate. All models are trained from scratch.

卷积神经网络学习笔记-SENet

接下来,为了验证SENet的有效性,我们将在ImageNet数据集上进行实验,并从两个方面进行演示。一是VS网络深度的性能增益;二是将SE嵌入到不同的现有网络中进行结果比较。此外,我们还将展示ImageNet大赛的结果。[en]Next, in order to verify the effectiveness of SENets, we will conduct experiments on ImageNet data sets and demonstrate it from two aspects. One is the performance gain of the depth of the vs network; the other is to embed SE into different existing networks to compare the results. In addition, we will also show the results in the ImageNet competition.

卷积神经网络学习笔记-SENet

首先,我们来看看网络深度对SE的影响。上表分别显示了ResNet-50、ResNet-101、ResNet-152和嵌入式SE模型的结果。第一栏原创是原作者实现的记过。为了进行公平的比较,我们在ROC上重新进行了实验,并得到了重新实现的结果(PS:我们的实现的准确性往往高于原始论文)。最后一列SE-MODULE指的是嵌入式SE模块的结果,其训练参数与我们重新实现的第二列的训练参数相同。括号中的红色值指的是相对于重新实现的精确度提高的幅度。[en]First, let’s take a look at the impact of the depth of the network on SE. The above table shows the results of the ResNet-50,ResNet-101,ResNet-152 and embedded SE models, respectively. The first column Original is the demerit realized by the original author. In order to make a fair comparison, we re-conduct the experiment on ROCS and get the result of Our re-implementation (PS: the accuracy of our implementation is often higher than that of the original paper). The last column SE-module refers to the result of the embedded SE module, and its training parameters are the same as those of the second column Our re-implementation. The red value in parentheses refers to the magnitude of the improvement in precision relative to the Our re-implementation.

从上表可以看出,SE-ResNets在各个深度上都远远超过了没有SE的相应结构版本的准确度,这表明无论网络深度如何,SE模块都可以为网络带来性能提升。值得一提的是,SE-ResNet-50可以达到与ResNet-101相同的精度;更重要的是,SE-ResNet-101远远超过更深的ResNet-152。[en]As can be seen from the above table, SE-ResNets far exceeds the accuracy of its corresponding structural version without SE in various depths, which shows that SE module can bring performance gain to the network regardless of the depth of the network. It is worth mentioning that SE-ResNet-50 can achieve the same precision as ResNet-101; what’s more, SE-ResNet-101 far exceeds the deeper ResNet-152.

卷积神经网络学习笔记-SENet

上图显示了ResNet-50和ResNet-152的训练过程,以及在ImageNet上嵌入SE模块的相应网络。显然,具有SE模块的网络收敛到较低的错误率。[en]The above figure shows the training process of ResNet-50 and ResNet-152 and their corresponding networks embedded with SE modules on ImageNet. It is obvious that the networks with SE modules converge to a lower error rate.

卷积神经网络学习笔记-SENet

此外,为了验证SE模块的泛化能力,我们还对ResNet以外的结构进行了实验。从上表可以看出,将SE模块嵌入到ResNeXt、BN-Inception、Inception-ResNet-v2中已经取得了相当大的收益。由此可见,SE的增益效应并不局限于某些特殊的网络结构,它具有很强的泛化能力。[en]In addition, in order to verify the generalization ability of the SE module, we also carried out experiments on the structure other than ResNet. As can be seen from the above table, embedding the SE module into the ResNeXt,BN-Inception,Inception-ResNet-v2 has achieved considerable gain. It can be seen that the gain effect of SE is not limited to some special network structures, it has a strong generalization.

卷积神经网络学习笔记-SENet

上图显示了嵌入在ResNeXt-50和Inception-ResNet-v2中的SE的训练过程的对比。[en]The picture above shows the comparison of the training process of SE embedded in ResNeXt-50 and Inception-ResNet-v2.

卷积神经网络学习笔记-SENet

在上表中,我们列出了ImageNet分类中的一些最新网络的结果。我们的SENET本质上是SE-ResNeXt-152(644第4d),它在ResNeXt-152上嵌入了SE模块,并进行了一些其他的修改和优化培训技巧,我们将在后面讨论这些内容。[en]In the above table we list the results of some of the latest networks in the ImageNet classification. Our SENet is essentially a SE-ResNeXt-152 (644th 4d), which embeds the SE module on ResNeXt-152 and makes some other modifications and training tips on optimization, which we will discuss later.

卷积神经网络学习笔记-SENet

最后,在ILSVRC2017比赛中,我们的融合模型在测试集上达到了2.251~TOP-5的错误率。与去年排名第一的2.991%的结果相比,我们的准确率提高了*25%。[en]Finally, in the ILSVRC 2017 competition, our fusion model achieves the error rate of 2.251 ~ top-5 on the test set. Compared with the result of 2.991% in the first place last year, we have improved the accuracy by * 25%.

4,总结

1,SE模块主要为了提升模型对 channel 特征的敏感性,这个模块是轻量级的,而且可以应用在现有的网络结构中,只需要增加较少的计算量就可以带来性能的提升。

2,提升很大,并且代价很小,通过对通道进行加权,强调有效信息,抑制无效信息,注意力机制,并且是一个通用的方法,应该在 Inception,Inception-ResNet, ResNeXt, ResNet 都能有所提升,适用范围很广。

3,思路很清晰简洁,实现很简单,用起来也很方便,各种试验都证明了其有效性,各种任务都可以尝试一下,效果应该不会太差。

5,Keras 实现 SENet

5.1 Keras 实现SE-Inception Net

首先,查看SE-先启网体系结构示意图:[en]First, look at the schematic diagram of the SE-Inception Net architecture:

卷积神经网络学习笔记-SENet

该图显示了将SE模块嵌入到先启结构中的示例。框旁边的尺寸信息表示该层的输出。在这里,我们使用全局平均池作为挤压操作。然后,两个完全连通的层形成一个瓶颈结构来建模通道之间的相关性,并输出与输入特征相同数量的权重。[en]The figure shows an example of embedding a SE module into an Inception structure. The dimension information next to the box represents the output of that layer. Here we use global average pooling as the Squeeze operation. Then the two Fully Connected layers form a Bottleneck structure to model the correlation between channels and output the same number of weights as input features.

我们首先将特征尺寸降低到输入1/16,然后在被RELU激活后通过完全连通的层上升到原始尺寸。与直接使用完全连接层相比,这样做的优势在于:[en]We first reduce the feature dimension to the input 1ax 16, and then rise back to the original dimension through a Fully Connected layer after being activated by ReLU. The advantage of doing this over using a Fully Connected layer directly is:

  • 1,具有更多的非线性,可以更好地拟合通道间复杂的相关性
  • 2,极大地减少了参数量和计算量。然后通过一个 Sigmoid的门获得 0~1 之间归一化的权重,最后通过一个 Scale的操作来将归一化后的权重加权到每个通道的特征上。

代码如下(这里 r = 16):

csharp;gutter:true; def build_SE_model(nb_classes, input_shape=(256, 256, 3)): inputs_dim = Input(input_shape) x = Inception(include_top=False, weights='imagenet', input_shape=None, pooling=max)(inputs_dim)</p> <pre><code>squeeze = GlobalAveragePooling2D()(x) excitation = Dense(units=2048//16)(squeeze) excitation = Activation('relu')(excitation) excitation = Dense(units=2048)(excitation) excitation = Activation('sigmoid')(excitation) excitation = Reshape((1, 1, 2048))(excitation) scale = multiply([x, excitation]) x = GlobalAveragePooling2D()(scale) dp_1 = Dropout(0.3)(x) fc2 = Dense(nb_classes)(dp_1) # 此处注意,为Sigmoid函数 fc2 = Activation('sigmoid')(fc2) model = Model(inputs=inputs_dim, outputs=fc2) return model </code></pre> <p>if <strong>name</strong> == '<strong>main</strong>': model =build_model(nb_classes, input_shape=(im_size1, im_size2, channels)) opt = Adam(lr=2*1e-5) model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy']) model.fit()</p> <pre><code> 注:[en]<u>Note:</u> 1,multiply([x, excitation]) 中的 x 的 shape 为(10, 10, 2048),Excitation 的 shape 为(1, 1, 2048) ,应保持他们的最后一维即 2048 相同。例如:如果用 DenseNet201,它的最后一层卷积出来的结果为(8, 8, 1920)(不包括全连接层),Excitation的 Reshape为(1, 1, 1920)。 2, fc2 = Activation('sigmoid')(fc2) ,此处注意,为Sigmoid函数。 #### 5.2 Keras 实现SE-ResNeXt Net 请看SEResNet架构图:[en]<u>Take a look at the SEResNet architecture diagram:</u> ![](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/rewrite/1226410-20210122104300237-589483955.png) ResNeXt是 ResNet的改进版本。这里参考了网友实现的 ResNeXt,代码如下: ;gutter:true;
from __future__ import print_function
from __future__ import absolute_import

import warnings
import numpy as np

from keras.models import Model
from keras.layers import Input
from keras.layers import Lambda
from keras.layers import Reshape

from keras.layers import Conv2D
from keras.layers import Activation
from keras.layers import AveragePooling2D
from keras.layers import GlobalAveragePooling2D
from keras.layers import BatchNormalization
from keras.layers import Dense

from keras.layers import Concatenate, concatenate
from keras.layers import Add, add
from keras.layers import Multiply, multiply

from keras import backend as K

class SEResNeXt(object):
def __init__(self, size=96, num_classes=10, depth=64, reduction_ratio=4, num_split=8, num_block=3):
self.depth = depth # number of channels
self.ratio = reduction_ratio # ratio of channel reduction in SE module
self.num_split = num_split # number of splitting trees for ResNeXt (so called cardinality)
self.num_block = num_block # number of residual blocks
if K.image_data_format() == ‘channels_first’:
self.channel_axis = 1
else:
self.channel_axis = 3
self.model = self.build_model(Input(shape=(size,size,3)), num_classes)

def conv_bn(self, x, filters, kernel_size, stride, padding=’same’):
”’
Combination of Conv and BN layers since these always appear together.

”’
x = Conv2D(filters=filters, kernel_size=[kernel_size, kernel_size],
strides=[stride, stride], padding=padding)(x)
x = BatchNormalization()(x)

return x

def activation(self, x, func=’relu’):
”’
Activation layer.

”’
return Activation(func)(x)

def channel_zeropad(self, x):
”’
Zero-padding for channle dimensions.

Note that padded channles are added like (Batch, H, W, 2/x + x + 2/x).

”’
shape = list(x.shape)
y = K.zeros_like(x)

if self.channel_axis == 3:
y = y[:, :, :, :shape[self.channel_axis] // 2]
else:
y = y[:, :shape[self.channel_axis] // 2, :, :]

return concatenate([y, x, y], self.channel_axis)

def channel_zeropad_output(self, input_shape):
”’
Function for setting a channel dimension for zero padding.

”’
shape = list(input_shape)
shape[self.channel_axis] *= 2

return tuple(shape)

def initial_layer(self, inputs):
”’
Initial layers includes {conv, BN, relu}.

”’
x = self.conv_bn(inputs, self.depth, 3, 1)
x = self.activation(x)

return x

def transform_layer(self, x, stride):
”’
Transform layer has 2 {conv, BN, relu}.

”’
x = self.conv_bn(x, self.depth, 1, 1)
x = self.activation(x)

x = self.conv_bn(x, self.depth, 3, stride)
x = self.activation(x)

return x

def split_layer(self, x, stride):
”’
Parallel operation of transform layers for ResNeXt structure.

”’
splitted_branches = list()
for i in range(self.num_split):
branch = self.transform_layer(x, stride)
splitted_branches.append(branch)

return concatenate(splitted_branches, axis=self.channel_axis)

def squeeze_excitation_layer(self, x, out_dim):
”’
SE module performs inter-channel weighting.

”’
squeeze = GlobalAveragePooling2D()(x)

excitation = Dense(units=out_dim // self.ratio)(squeeze)
excitation = self.activation(excitation)
excitation = Dense(units=out_dim)(excitation)
excitation = self.activation(excitation, ‘sigmoid’)
excitation = Reshape((1,1,out_dim))(excitation)

scale = multiply([x,excitation])

return scale

def residual_layer(self, x, out_dim):
”’
Residual block.

”’
for i in range(self.num_block):
input_dim = int(np.shape(x)[-1])

if input_dim * 2 == out_dim:
flag = True
stride = 2
else:
flag = False
stride = 1

subway_x = self.split_layer(x, stride)
subway_x = self.conv_bn(subway_x, out_dim, 1, 1)
subway_x = self.squeeze_excitation_layer(subway_x, out_dim)

if flag:
pad_x = AveragePooling2D(pool_size=(2,2), strides=(2,2), padding=’same’)(x)
pad_x = Lambda(self.channel_zeropad, output_shape=self.channel_zeropad_output)(pad_x)
else:
pad_x = x

x = self.activation(add([pad_x, subway_x]))

return x

def build_model(self, inputs, num_classes):
”’
Build a SENet model.

”’
x = self.initial_layer(inputs)

x = self.residual_layer(x, out_dim=64)
x = self.residual_layer(x, out_dim=128)
x = self.residual_layer(x, out_dim=256)

x = GlobalAveragePooling2D()(x)
x = Dense(units=num_classes, activation=’softmax’)(x)

return Model(inputs, x)

6,SE模块的 Pytorch实现

SE模块是非常简单的,实现起来也比较容易,这里给出Pytorch版本的实现(地址:https://zhuanlan.zhihu.com/p/65459972/)。

代码如下:[en]The code is as follows:

csharp;gutter:true; class SELayer(nn.Module): def <strong>init</strong>(self, channel, reduction=16): super(SELayer, self).<strong>init</strong>() self.avg_pool = nn.AdaptiveAvgPool2d(1) self.fc = nn.Sequential( nn.Linear(channel, channel // reduction, bias=False), nn.ReLU(inplace=True), nn.Linear(channel // reduction, channel, bias=False), nn.Sigmoid() )</p> <pre><code>def forward(self, x): b, c, _, _ = x.size() y = self.avg_pool(x).view(b, c) y = self.fc(y).view(b, c, 1, 1) return x * y.expand_as(x) </code></pre> <pre><code> 对于SE-ResNet型号,您只需将SE模块添加到剩余单元:[en]<u>For the SE-ResNet model, you only need to add the SE module to the residual unit:</u> ;gutter:true;
class SEBottleneck(nn.Module):
expansion = 4

def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16):
super(SEBottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.se = SELayer(planes * 4, reduction)
self.downsample = downsample
self.stride = stride

def forward(self, x):
residual = x

out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)

out = self.conv3(out)
out = self.bn3(out)
out = self.se(out)

if self.downsample is not None:
residual = self.downsample(x)

out += residual
out = self.relu(out)

return out

参考地址:https://www.sohu.com/a/161633191_465975

https://blog.csdn.net/u014380165/article/details/78006626

https://zhuanlan.zhihu.com/p/65459972/

https://blog.csdn.net/qq_38410428/article/details/87979417

https://github.com/yoheikikuta/senet-keras

https://github.com/moskomule/senet.pytorch

Original: https://www.cnblogs.com/wj-1314/p/14147932.html
Author: 战争热诚
Title: 卷积神经网络学习笔记-SENet

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/5963/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部