卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

完整代码及其数据,请移步小编的GitHub地址

传送门:请点击我

如果点击有误:https://github.com/LeBron-Jian/DeepLearningNote

在这里,结合网络和MobileNet论文的信息,来看看MobileNet,基本的代码和图片都是来自网络的,谢谢在这里,参考链接在后面。我们开始吧。[en]Here, combined with the information of the network and the MobileNet paper, take a look at the MobileNet, the basic code and pictures are from the network, thank you here, the reference links are later. Here we go.

MobileNet论文写的很好,有想法的可以去看一下,我这里提供翻译地址:

深度学习论文翻译解析(十七):MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析(十八):MobileNetV2: Inverted Residuals and Linear Bottlenecks

深度学习论文翻译解析(十九):Searching for MobileNetV3

卷积神经网络CNN已广泛应用于计算机视觉领域,并取得了良好的效果。图1显示了CNN近年来在ImageNet比赛中的表现。我们可以看到,为了追求分类精度,模型的深度越来越深,模型的复杂性越来越高。例如,深度残差网络(ResNet)有多达152层。[en]Convolution neural network CNN has been widely used in the field of computer vision, and has achieved good results. Figure 1 shows the performance of CNN in ImageNet competitions in recent years. We can see that in order to pursue classification accuracy, the model depth is getting deeper and deeper, and the model complexity is getting higher and higher. For example, the depth residual network (ResNet) has as many as 152layers.

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

然而,在移动或嵌入式设备等一些实际应用场景中,很难应用如此庞大和复杂的模型。首先,模型太大,面临内存不足的问题,其次,这些场景需要低延迟,或快速响应。想象一下,如果自动驾驶汽车的行人检测系统非常慢,会发生什么可怕的事情。因此,在这些场景下研究小型高效的CNN模型是非常重要的,至少目前是这样,尽管未来的硬件会越来越快。[en]However, it is difficult to apply such a large and complex model in some real application scenarios, such as mobile or embedded devices. First of all, the model is too large, facing the problem of insufficient memory, and secondly, these scenarios require low latency, or fast response. Imagine what terrible things will happen if the pedestrian detection system of self-driving cars is very slow. Therefore, it is very important to study small and efficient CNN models in these scenarios, at least for now, although the hardware will be faster and faster in the future.

对当前研究的总结可以分为两个方向:[en]The summary of the current research can be divided into two directions:

  • 一是将训练好的复杂模型进行压缩,得到一个小模型。[en]* one is to compress the trained complex model to get a small model.
  • 二是设计小模型,直接培训[en]* the second is to design small models and train them directly.

在任何情况下,目标都是减小模型大小(参数大小)并提高模型速度(速度、低延迟),同时保持模型性能(准确性)。本文的主人公MobileNet属于后者,这是Google最近提出的一种小型高效的CNN模型,它在准确性和时延之间做出了妥协。[en]In any case, the goal is to reduce the model size (parameters size) and improve the model speed (speed, low latency) while maintaining model performance (accuracy). MobileNet, the protagonist of this paper, belongs to the latter, which is a small and efficient CNN model proposed by Google recently, which makes a compromise between accuracy and latency.

MobileNet 需要尽可能维持其中发展较快的计算机视觉和深度学习领域与移动环境局限性之间的平衡。因此,谷歌一直在定期对 MobileNets 架构进行更新,其中也加入了一些有关深度学习领域最新的想法。

1,深度可分离卷积(Depthwise separable convolution)

自2017年由谷歌提出以来,MobileNet就是轻量级网络的先行者。它经历了几代人的更新,已经成为学习轻量级网络的唯一途径。事实上,介绍MobileNet V1只有一句话:*MobileNet V1是用深度可分离卷积取代VGG中的标准卷积层。那么,这种深度可分离卷积是什么呢?[en]Since it was proposed by Google in 2017, MobileNet is the Inception of the lightweight network. It has undergone generations of updates and has become the only way to learn the lightweight network. In fact, there is only one sentence to introduce MobileNet V1: * MobileNet V1 is to replace the standard convolution layer in VGG with depth separable convolution. So what is this deep separable convolution?

MobileNet的基本单元是深度级可分离卷积(depthwise separable convolution——DSC),其实这种结构之前已经被使用在Inception模型中。根据史料记载,可追溯到2012年的论文Simplifying ConvNets for Fast Learning,作者提出了可分离卷积的概念(下图(a)):

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

Laurent Sifre博士2013年在谷歌实习期间,将可分离卷积拓展到了深度(depth),并且在他的博士论文Rigid-motion scattering for image classification中有详细的描写,感兴趣的同学可以去看看论文。

可分离卷积主要有两种类型:空间可分离卷积和深度可分离卷积。[en]There are two main types of separable convolution: * space separable convolution and depth separable convolution * .

1.1 空间可分离(略讲)

顾名思义,空间可分就是把一个大的卷积核变成两个小的卷积核,比如把3-3核分成3-1核和1-3核:[en]As the name implies, spatially separable is to turn a large convolution kernel into two small convolution kernels, such as dividing a 3-3 kernel into a 3-1 kernel and a 1-3 kernel:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

由于空间上可分离的卷积不在Mobilenet的范围内,因此我们不讨论它。[en]Since the spatially separable convolution is not within the scope of Mobilenet, let’s not talk about it.

1.2 深度可分离卷积

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

让我们首先学习标准的卷积运算:[en]Let’s first learn the standard convolution operation:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

输入12‘12’3的输入特征图,通过5‘5’3的卷积核得到8‘8’1的输出特征图。如果此时我们有256个特征图,我们将得到8/8/256的输出特征图。[en]Input an input feature graph of 12’12’3, and get an output feature map of 8’8’1 through the convolution kernel of 5’5’3. If we have 256 feature maps at this time, we will get an output feature map of 8 / 8 / 256.

可以看此图,更清晰(盗图地址:https://zhuanlan.zhihu.com/p/29119239)

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

这就是标准卷积的作用。那么深度卷积和逐点卷积呢?[en]This is what standard convolution does. What about deep convolution and point-by-point convolution?

1.2.1 深度卷积

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

与标准的卷积网络不同,我们将卷积核分割成单通道,在不改变输入特征图像深度的情况下对每一通道进行卷积。以此方式,获得与输入特征图具有相同通道数的输出特征图。如上图所示,8‘8’3的输出特征图是经过12‘12’3的输入特征图和5‘5’1‘3的深度卷积得到的。输入和输出的维度都是常量3,所以就会出现问题,通道数太少,特征图的维度太少,我们能得到足够的有效信息吗?[en]Different from the standard convolution network, we split the convolution kernel into a single channel, and convolution each channel without changing the depth of the input feature image. in this way, the output feature graph with the same number of channels as the input feature graph is obtained. As shown in the above figure, the output characteristic map of 8’8’3 is obtained after the input feature map of 12’12’3 and the depth convolution of 5’5’1’3. The dimensions of input and output are constant 3, so there will be a problem, the number of channels is too small, the dimensions of the feature graph are too few, can we get enough valid information?

1.2.2 逐点卷积

逐点卷积为1对1卷积,其主要功能是对特征图进行升维和降维,如下图所示:[en]Point-by-point convolution is 1-1 convolution, and its main function is to upgrade and reduce the dimension of the feature graph, as shown in the following figure:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

在深度卷积过程中,我们得到了8‘8’3的输出特征图。我们用256个1‘1’3卷积来检查输入特征图进行卷积运算,输出的特征图与标准的卷积运算是相同的。[en]In the process of depth convolution, we get the output characteristic graph of 8’8’3. We use 256 1’1’3 convolution to check the input feature map for convolution operation, and the output feature map is the same as the standard convolution operation.

标准卷积和深度可分离卷积的比较如下:[en]The comparison between standard convolution and depth separable convolution is as follows:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

3,为什么要深度可分离卷积?

简单地说,如果有一种方法可以让你使用更少的参数和更少的操作,但可以达到类似的结果,你会使用它吗?[en]To put it simply, if there was a method that allows you to use fewer parameters and fewer operations, but can achieve similar results, would you use it?

深度可分离卷积就是这样一种方法。我们先计算一下标准卷积的参数个数和计算量(只考虑MADD):[en]Depth separable convolution is one such method. Let’s first calculate the number of parameters and the amount of computation for standard convolution (only consider MAdd):

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

这里,我们假设输入是DKDKM的特征映射,其中DK是输入特征Mpa的长度和宽度(仅考虑相同的长度和宽度),M是输入通道的数量,N是输出通道的数量。这种典型卷积结构的核通常是DKDKMN。[en]Here we assume that the input is the feature map of DkDkM, where Dk is the length and width of the input feature mpa (simply consider the same length and width), M is the number of input channels, and N is the number of output channels. The kernel of such a typical conv structure is usually DkDkMN.

3.1 标准卷积的参数量

卷积核的大小为DKDKM,共N个,因此标准卷积的参数个数为:[en]The size of the convolution kernel is DkDkM, with a total of N, so the number of parameters for the standard convolution is:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

3.2 标准卷积的计算量

卷积核的大小是DKDKM,总共有N个,每个都要进行DWDH运算,所以标准卷积的计算量是:[en]The size of the convolution kernel is DkDkM, there are a total of N, each of which has to perform DwDh operations, so the amount of calculation of the standard convolution is:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

现在标准卷积已经完成,让我们来计算深可分卷积的参数数量和计算量:[en]Now that the standard convolution is finished, let’s calculate the number of parameters and the amount of calculation of the deep separable convolution:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

首先,深度运算和点式运算都是卷积运算,尤其是点式运算是典型的1/1卷积运算。纵横卷积是一种输入通道到卷积滤波的卷积运算,很明显输出通道的数量等于输入通道的数量。点式卷积是在深度卷积运算之后进行的,它使用1:1的卷积来融合先前的IC(输入通道)特征映射,并对最终输出OC(输出通道)特征的特征映射进行排序。[en]First of all, both Depthwise and Pointwise are conv operations, especially Pointwise is a typical 1 / 1 conv operation. Depthwise conv is an input channel-to-conv filter convolution operation, and it is obvious that the number of output channels output is equal to the number of input channels. Pointwise conv is carried out after the Depthwise conv operation, and it uses the conv of 1: 1 to fuse the previous IC (input channels) feature maps and sort out the feature maps of the final output OC (output channels) features.

3.3 深度可分离卷积的参数量

深度可分褶积的参数个数由深度褶积和逐点褶积两部分组成。[en]The number of parameters of depth separable convolution consists of two parts: depth convolution and point by point convolution.

深度卷积的卷积核大小DkDkM;逐点卷积的卷积核大小为1:1M,共有N个,因此深度可分卷积的参数个数为:[en]The convolution kernel size of depth convolution DkDkM; the convolution kernel size of point-by-point convolution is 1: 1 M, there are a total of N, so the number of parameters of depth separable convolution is:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

3.4 深度可分离卷积的计算量

深度可分离卷积的计算也包括深度卷积和逐点卷积。[en]The computation of depth separable convolution is also composed of depth convolution and point-by-point convolution.

深度卷积的卷积核大小为dk×dk×M,需要dw×dh次乘加运算;逐点卷积的卷积核大小为1×1×M,有N个,总共需要做dw×dh乘法和加法运算,因此深度可分卷积的计算量为:[en]The convolution kernel size of deep convolution is Dk × Dk × M, which requires Dw × Dh times multiplication and addition; the convolution kernel size of point-by-point convolution is 1 × 1 × M, there are N, and a total of Dw × Dh multiplication and addition operations need to be done, so the amount of computation for depth separable convolution is:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

3.5 总结

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

乘法和加法运算的参数数量和计算量减少到原来的1max N+1/D2K。[en]The number of parameters and the amount of computation for multiplication and addition operations are reduced to the original 1max N + 1/D2k.

我们通常使用的卷积核是3到3,也就是说它会下降到原来的1/9到1/8。[en]The convolution kernel that we usually use is 3 to 3, that is, it will drop to the original 1/9 to 1/8.

假设:输出为一个2242243的图像,VGG网络某层卷积输入的尺寸是11211264的特征图,卷积核为33128,

标准卷积的运算量是:3×3×128×64×112×112 = 924844032

深度可分离卷积的运算量是:3×3×64×112×112+128×64×112×112 = 109985792

在这一层,Mobilenet V1使用的深度可分离卷积计算与标准卷积计算的比率为:[en]At this layer, the ratio of the depth separable convolution computation used by Mobilenet V1 to the standard convolution computation is:

109985792 /924844032 = 0.1189

与我们所计算的九分之一到八分之一一致

让我们了解一下MobileNet网络。[en]Let’s learn about the MobileNet network.

MobileNet是谷歌提出来的移动端分类网络。在V1中MobileNet应用了深度可分离卷积(Depth-wise Seperable Convolution)并提出两个超参来控制网络容量,这种卷积背后的假设是跨channel相关性 和跨spatial相关性的解耦。深度可分离卷积能够节省参数量,在保持移动端可接受的模型复杂性的基础上达到了相当的高精度。而在V2中,MobileNet应用了新的单元:Inverted residual with linear bottleneck,主要的改动是为 Bottleneck 添加了 linear 激活输出以及将残差网络的 skip-connection 结构转移到低维 Bottleneck 层。

2,MobileNet V1

MobileNet V1是一种基于流水线结构,使用 深度可分离卷积构建的轻量级神经网络,并通过 两个超参数的引入使得开发人员可以基于自己的应用和资源限制选择合适的模型。

在概念上,MobileNetV1正在努力实现两个基本目标,以构建移动第一计算视觉模型:1,参数更少的更小的模型;2,更少的复杂性,更少的乘法和加法。遵循这些原则,MobileNet V1是一个小型、低延迟、低功耗的参数化模型,可满足各种用例的资源限制。它们可以用来实现分类、检测、嵌入和分割等功能。[en]Conceptually, MobileNetV1 is trying to achieve two basic goals to build a mobile first computational vision model: 1, a smaller model with fewer parameters; and 2, less complexity, less multiplication and addition. Following these principles, MobileNet V1 is a small, low-latency, low-power parameterized model that meets the resource constraints of various use cases. They can be used to implement functions such as classification, detection, embedding and segmentation.

2.1 MobileNet V1的创新点

2.1.1 depthwise后接BN层和RELU6,pointwise 后也接BN层和ReLU6

这如下图所示(应为RELU6)。左边的图片是传统的卷积,右边的图片是深度可分的卷积。此外,ReLU6增加了模型的非线性变换,增强了模型的泛化能力。[en]This is shown in the following figure (which should be RELU6). The left picture is the traditional convolution and the right picture is the depth separable convolution. More ReLU6 increases the nonlinear change of the model and enhances the generalization ability of the model.

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

2.1.2 ReLU6激活函数的作用

V1中使用了ReLU6作为激活函数,这个激活函数在 float16/int8 的嵌入式设备中效果很好,能较好的保持网络的鲁棒性。

ReLU6 就是普通的ReLU,但是限制最大输出值为6(对输出值做 clip),这是为了在移动端设备float16的低精度的时候,也能有很好的数值分辨率,如果对ReLU的激活范围不加限制,输出范围为0到正无穷,如果激活值非常大,分布在一个很大的范围内,则低精度的float16无法很好地精确描述如此大范围的数值,带来精度损失。

ReLU6函数与其导函数如下:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

对应的图片如下:[en]The corresponding images are as follows:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

2.1.3 MobileNet V1给出两个超参数(宽度乘子α 和 分辨率乘子 rho)

虽然MobileNet网络的结构和时延相对较小,但在具体应用中往往需要更小、更快的模型。为此,引入了宽度系数α。为了控制模型的尺寸,我们引入了分辨率因子Rho。[en]Although the structure and latency of MobileNet network are relatively small, a smaller and faster model is often needed in specific applications. For this reason, the width factor alpha is introduced. In order to control the size of the model, we introduce the resolution factor rho.

宽度因子 alpha (Width Mutiplier)在每一层对网络的输入输出通道数进行缩减,输出通道数由 M 到 alphaM,输出通道数由 N 到 alphaN,变换后的计算量为:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

通常,Alpha介于(0,1]之间,典型值为1,0.75,0.5,0.25。计算量和参数个数比不使用宽度因子前减少了2倍。[en]Usually the alpha is between (0,1], and the typical values are 1,0.75,0.5,0.25. The amount of calculation and the number of parameters are reduced by 2 times as much as before the width factor is not used.

分辨率因子 rho (resolution multiplier)用于控制输入和内部层表示,即用分辨率因子控制输入的分辨率,深度卷积和逐点卷积的计算量为:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

通常,Rho介于(0,1]之间,典型的输入分辨率为224,192,160128。在不使用宽度因子之前,计算量减少1/(α2RhoRho)倍,参数的数量不受影响。[en]Usually the rho is between (0,1], and the typical input resolution is 224,192,160128. The amount of calculation is reduced by 1 / (alpha2rhorho) times before the width factor is not used, and the number of parameters has no effect.

通过两个超参数,可以进一步简化模型,并给出了具体的实验结果。在这一点上,另一方面,扩大宽度和分辨率可以提高网络的精度,但如果逐一增加,精度很快就会达到饱和,这也是谷歌在2019年提出EfficientNet的原因之一,动态提高深度、宽度和分辨率来提高网络的精度。[en]Through two super parameters, the model can be further reduced, and the specific experimental results are also given in this paper. At this point, on the other hand, expanding the width and resolution can improve the accuracy of the network, but if you increase one by one, the accuracy will soon reach saturation, which is one of the reasons why Google proposed efficientnet in 2019, dynamically improving the depth, width, and resolution to improve the accuracy of the network.

2.2 MobileNet V1 的网络架构

MobileNet V1 的核心架构则基于一个流线型架构,该架构使用深度可分离卷积网络来构建了轻量化深度神经网络。就神经网络结构而言,深度可分类卷积将卷积核分为两个单独的卷积核,这两个卷积核依次进行两个卷积,即先是深度卷积,然后进行逐点卷积,如下图所示:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

在MobileNetV1中,深度卷积网络的每个输入信道都应用了单个滤波器。然后,逐点卷积应用 1*1 卷积网络来合并深度卷积的输出。这种标准卷积方法既能滤波,又能一步将输入合并成一组新的输出。在这之中,深度可分离卷积将其分为两次,一层用于滤波,另一层则用于合并。

MobileNet的网络结构如下,一共由 28层构成(不包括AvgPool 和 FC 层,且把深度卷积和逐点卷积分开算),其除了第一层采用的是标准卷积核之外,剩下的卷积层都是用Depth Wise Separable Convolution。

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

3, MobileNet V2

MobileNet V2架构在 2018年初发布,MobileNet V2基于MobileNet V1的一些思想,并结合新的思想来优化。从架构上来看,MobileNet V2为架构增添了两个新模块:1,引入了层与层之间的线性瓶颈;2,瓶颈之间的快捷连接。

MobileNetV2之中的核心思想是,瓶颈对模型的中间输入和输出进行编码,而内层则用于封装模型从较低级别概念(如:像素等)转换到较高级别描述符(如:图像类别等)的能力。最后,与传统的剩余连接一样,快捷方式能够实现更快地训练速度和更高的准确率。

3.1 MobileNet V1 VS MobileNet V2

3.1.1 MobileNet V1 的问题

MobileNet V1 的结构较为简单,另外,主要的问题还是在Depthwise Convolution 之中,Depthwise Convolution 确实降低了计算量,但是Depthwise部分的 Kernel 训练容易废掉,即卷积核大部分为零,作者认为最终再经过 ReLU 出现输出为 0的情况。

V2 传递的思想只有一个,即ReLU 会对 channel 数较低的张量造成较大的信息损耗,简单来说,就是当低维信息映射到高维,经过ReLU后再映射回低维时,若映射到的维度相对较高,则信息变换回去的损失较小;若映射到的维度相对较低,则信息变换回去后损失很大,如下图所示:

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

当原始输入维度增加到15个,然后添加RELU时,不会丢失太多信息;但如果只将原始输入维度增加到2-5个维度,然后添加REU,则会有更严重的信息丢失。因此,认为低维的REU操作很容易造成信息的丢失。另一方面,如果在高维上执行RELU操作,则信息损失将非常小。另一种解释是,当高维信息转换回低维信息时,相当于做了一次特征压缩,部分信息会丢失,重新生成后损失会更大。为了解决这个问题,作者用线性激活函数代替了REU。[en]When the number of original input dimensions increases to 15 and then add ReLU, you will not lose much information; but if you only increase the original input dimensions to 2-5 dimensions and then add ReLU, there will be more serious information loss. Therefore, it is considered that the loss of information can be easily caused by the ReLU operation on the low dimension. On the other hand, if the ReLU operation is performed in high dimensions, the loss of information will be very little. Another explanation is that when high-dimensional information is transformed back to low-dimensional information, it is equivalent to doing a feature compression, part of the information will be lost, and after ReLU, the loss will be even greater. In order to solve this problem, the author replaces ReLU with linear activation function.

至于REU如何丢失特征,我的理解是:*REU的特性使负输入的输出为零,降维本身就是特征压缩的过程,这使得特征损失更加严重。[en]As for how ReLU loses features, my understanding is: * the characteristics of ReLU make the output zero for negative inputs, and dimensionality reduction itself is the process of feature compression, which makes the feature loss more serious.

3.1.2 MobileNet V1 和 V2 的对比

相同点:都是采用 Depth-wise (DW)卷积搭配 Point-wise(PW)卷积的方式来提取特征。这两个操作合起来也叫 Depth-wise Separable Convolution,之前在 Xception中被广泛使用。这么做的好处是理论上可以成倍的减少卷积层的时间复杂度和空间复杂度,由下式可见,因为卷积核的尺寸K通常远远小于输出通道数 Count,因此标准卷积的计算量复杂度近似为 DW+PW 组合卷积的 K2倍。

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

不同点(Linear Bottleneck):V2在DW卷积之前新加了一个PW卷积,这么做的原因是因为DW卷积由于本身的计算特性决定它自己没有改变通道数的能力,上一层给他多少通道,他就只能输出多少通道。所以如果上一层的通道数本身很少的话,DW也只能很委屈的低维空间提取特征,因此效果不是很好,现在V2为了改善这个问题,给每个 DW 之前都配备了一个PW,专门用来升维,定义升维系数为 t=6,这样不管输入通道数Cin 是多是少,经过第一个 PW 升维之后,DW都是在相对的更高维(tCin)是多是少,经过第一个 PW升维之后,DW 都是在相对的更高维(tCin)进行辛勤工作的。而且V2去掉了第二个PW的激活函数,论文作者称其为 Linear Bottleneck。这么做的渊源,是因为作者认为激活函数在高维空间能够有效的增加非线性,而在低维空间时则会破坏特征,不如线性的效果好。由于第二个PW的主要功能就是降维,因此按照上面的理论,降维之后就不宜再使用ReLU6了。

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

3.2 MobileNet V2 的创新点

MobileNet V2 是对 MobileNet V1 的改进,同样是一个轻量化卷积神经网络。MobileNet V2 发布于2018年,时隔一年,谷歌的又一力作,V2在V1的基础上,引入了Inverted Residuals和Linear Bottlenecks。

3.2.1 Inverted Residuals

这可以翻译为“倒置剩余模”。你是什么意思?让我们比较一下残数模和倒余数模之间的区别。[en]This can be translated into “inverted residual module”. What do you mean? Let’s compare the difference between the residual module and the inverted residual module.

  • 残差模:输入先用1:1卷积压缩,再用3:3卷积提取特征,最后用1:1卷积把道数转换回来,整个过程就是“压缩-卷积-扩展”。这样做的目的是减少3×3模数的计算量,提高余数模数的计算效率。[en]* residual module: the input is first compressed by the convolution of 1: 1, then the feature is extracted by the convolution of 3: 3, and finally the number of channels is transformed back by the convolution of 1: 1. The whole process is “compression-convolution-expansion”. The purpose of this is to reduce the amount of calculation of the 3×3 module and improve the calculation efficiency of the residual module.
  • 逆余数模块:Input首先通过1:1卷积对通道进行扩展,然后使用3-3深度卷积,最后使用1-1逐点卷积对通道数进行反向压缩。整个过程就是“扩展-卷积-压缩”。你为什么要这么做?由于深度卷积不能改变通道数,因此特征提取受到输入通道数的限制,所以首先增加通道数。本文中的膨胀系数为6。[en]* inverse residual module: input first expands the channel through 1: 1 convolution, then uses 3-3 depthwise convolution, and finally uses 1-1 pointwise convolution to compress the number of channels back. The whole process is “expansion-convolution-compression”. Why would you do that? Because depthwise convolution can not change the number of channels, feature extraction is limited by the number of channels input, so the number of channels is increased first. The expansion factor in this paper is 6.

下面的图片是最合适的。[en]The following picture is most appropriate.

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

3.2.2 Linear Bottleneck

这个模块的目的是解决一开始提出的low-dimensional-high-dimensional-low-dimensional问题,即用线性激活函数代替最后一层的RELU,而其他层的激活函数仍然是RELU6。[en]The purpose of this module is to solve the low-dimensional-high-dimensional-low-dimensional problem raised at the beginning, that is, the ReLU of the last layer is replaced by a linear activation function, while the activation function of other layers is still ReLU6.

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

上面已经详细解释过了,所以我不在这里重复了。[en]It has been explained in detail above, so I will not repeat it here.

3.3 MobileNet V2 网络架构

Mobilenet V2 的网络模块如下图所示,当 stride=1时,输入首先经过 11 卷积进行通道数的扩张,此时激活函数为 ReLU6;然后经过33的depthwise卷积,激活函数是ReLU6;接着经过1*1的pointwise卷积,将通道数压缩回去,激活函数是linear;最后使用shortcut,将两者进行相加。而当stride=2时,由于input和output的特征图的尺寸不一致,所以就没有shortcut了。

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

最后给出了V2的网络结构。其中t是扩展稀疏,c是输出通道的数量,n是层的重复次数,s是步长。可以看出,V2的网络比V1的网络要深得多。V2有54层。[en]Finally, the network structure of V2 is given. Where t is the expansion sparse, c is the number of output channels, n is the number of repetitions of the layer, and s is the step size. It can be seen that the network of V2 is much deeper than that of V1. V2 has 54 layers.

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

4,MobileNet V3

MobileNet V3发表于2019年,Mobilenet-V3 提供了两个版本,分别为 MobileNet-V3 Large以及 MobileNet-V3 Small,分别适用于对资源要求不同的情况。V3结合了v1的深度可分离卷积、v2的Inverted Residuals和Linear Bottleneck、SE模块,利用NAS(神经结构搜索)来搜索网络的配置和参数。这种方式已经远远超过了人工调参了,太恐怖了。

4.1 MobileNet V3 的创新点

4.1.1 修改尾部结构

在MobileNetV2中,在Avg Pooling之前,存在一个 11 的卷积层,目的是提高特征图的维度,更有利于结构的预测,但是这其实带来了一定的计算量了。所以这里作者做了修改,将其放在 avg Pooling 的后面,首先利于 avg Pooling 将特征图的大小由 77 降到了 11,降到 11 后,然后再利用 11 提高维度,这样就减少了 77 =49 倍的计算量。并且为了进一步的降低计算量,作者直接去掉了前面纺锤型卷积的 33 以及 11 卷积,进一步减少了计算量,就变成了如下图第二行所示的结构,作者将其中的 33 以及 11 去掉后,精度并没有得到损失,这里降低了大约 10ms的延迟,提高了15%的运算速度,且几乎没有任何精度损失。其次,对于v2的输入层,通过3*3卷积将输入扩张成32维。作者发现使用ReLU或者switch激活函数,能将通道数缩减到16维,且准确率保持不变。这又能节省3ms的延时。

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

4.1.2 非线性变换的改变

由于嵌入式设备计算Sigmoid会消耗大量的计算资源,特别是在移动端,作者提出了h-Switch作为激活函数。并且随着网络的深入,非线性激活函数的代价将会降低。因此,只有在更深的层面使用h-Switch才能获得更大的优势。[en]Because the embedded device computing sigmoid will consume a lot of computing resources, especially on the mobile side, the author proposes h-switch as the activation function. And with the deepening of the network, the cost of nonlinear activation function will be reduced. So you can only gain greater advantage by using h-switch at a deeper layer.

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

如果你看上图,你会发现两者有很大的不同(但sish是谷歌自己的研究,h-sish在此基础上优化了速度)。[en]If you look at the image above, you can see that there is a big difference (but swish is Google’s own research, and h-swish optimizes speed on the basis of it).

使用RELU的好处:[en]Benefits of using ReLU:

  • 1,可以在任何软硬件平台进行计算
  • 2,量化的时候,它消除了潜在的精度损失,使用 h-swish 替换 swish,在量化模式下会提高大约 15%的效率

4.1.3 引入 SE 结构

在v2的 bottleneck 结构中引入SE模块,并且放在了 depthwise filter 之后,SE模块是一种轻量级的通道注意力模块,因为SE结构会消耗一定的时间,所以在depthwise之后,经过池化层,然后第一个fc层,通道数缩小4倍,再经过第二个fc层,通道数变换回去(扩大4倍),然后与depthwise进行按位相加,这样作者发现,即提高了精度,同时还没有增加时间消耗。

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

4.2 MobileNet V3 网络结构

MobileNet V3 首先使用 MnasNet 进行粗略结构的搜索,然后使用强化学习从一组离散的选择中选择最优配置。之后,MobileNet V3再使用 NatAdapt 对体系结构进行微调,这体现了 NetAdapt 的补充功能,它能够以较小的降幅对未充分利用的激活通道进行调整。

此外,MobileNet的另一个新奇想法是在核心架构(简称SENet,也是ImageNet 2017图像分类冠军)中增加了一个名为“挤压激发”的神经网络。神经网络的核心思想是通过显式建模网络的卷积特征通道之间的相互依赖来提高网络生成的表示的质量。具体来说,就是通过学习自动获取每个特征的重要性,然后根据这个结果对有用的特征进行改进,对当前任务不有用的特征进行抑制。[en]In addition, another novel idea of MobileNet is to add a neural network called “Squeeze-and-Excitation” to the core architecture (referred to as SENet, which is also the champion of ImageNet 2017 image classification). The core idea of the neural network is to improve the quality of the representation generated by the network by explicitly modeling the interdependence between the convolution feature channels of the network. Specifically, it is to obtain the importance of each feature automatically through learning, and then according to this result to improve the useful features and suppress the features that are not useful to the current task.

为此,开发人员提出了一种机制,允许网络重新校准其功能。通过这种机制,网络可以学习使用全局信息来选择性地强调信息特征和抑制不太有用的特征。在MobileNet V3中,架构对MobilenetV2进行了扩展,将SENET作为搜索空间的一部分,从而产生了更稳定的架构。[en]For this reason, developers propose a mechanism that allows the network to recalibrate its features. Through this mechanism, the network can learn to use global information to selectively emphasize informational features and suppress less useful features. In the case of MobileNet V3, the architecture extends MobilenetV2 to include SENet as part of the search space, resulting in a more stable architecture.

Mobilenet V3 还有一个有趣的优化,则是重新设计了体系结构中一些运行成本较高的层。第二代 MobilenetV2 中的一些层是模型准确性的基础,但也引入了潜在变量。通过合并一些基本的优化功能。MobileNet V3 在能够不牺牲准确率的情况下,删除了 MobilenetV2 体系结构中三个运行成本较高的层。

V3的结构如下图,作者提供了两个版本的V3,分别是large和small,对应与高资源和低资源的情况,两者都是NAS进行搜索出来的。

卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

回顾移动网系列,我们可以看到,精度在逐步提高,时延在减少。虽然ImageNet的精确度不能达到最先进的水平,但在同样的资源消耗下,其优势可以得到极大的体现。[en]Reviewing the mobilenet series, we can see that the accuracy is gradually improving and the delay is decreasing. Although the accuracy of imagenet can not reach state-of-art, but under the same resource consumption, its advantages can be greatly reflected.

5,MobileNet 的 python实现

5.1 Keras 实现 MobileNet V1

它上面有一个设计的网络架构,可以按照凯拉斯的标准来构建。[en]There is a designed network architecture on it, which can be built according to Keras.

其中一些代码如下所示:[en]Some of the codes are as follows:

csharp;gutter:true; from keras.applications.imagenet_utils import _obtain_input_shape from keras import backend as K from keras.layers import Input, Convolution2D, \ GlobalAveragePooling2D, Dense, BatchNormalization, Activation from keras.models import Model from keras.engine.topology import get_source_inputs from model.depthwise_conv2d import DepthwiseConvolution2D #debug</p> <h1>from depthwise_conv2d import DepthwiseConvolution2D #release</h1> <p>from keras.utils import plot_model</p> <p>'''Google MobileNet model for Keras.</p> <h1>Reference:</h1> <ul> <li><a href="https://arxiv.org/pdf/1704.04861.pdf">MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications</a> '''</li> </ul> <p>def MobileNet(input_tensor=None, input_shape=(224,224,3), alpha=1, shallow=False, classes=1000): """Instantiates the MobileNet.Network has two hyper-parameters which are the width of network (controlled by alpha) and input size.</p> <pre><code> # Arguments input_tensor: optional Keras tensor (i.e. output of .Input()) to use as image input for the model. input_shape: optional shape tuple, only to be specified if include_top is False (otherwise the input shape has to be (224, 224, 3) (with channels_last data format) or (3, 224, 244) (with channels_first data format). It should have exactly 3 inputs channels, and width and height should be no smaller than 96. E.g. (200, 200, 3) would be one valid value. alpha: optional parameter of the network to change the width of model. shallow: optional parameter for making network smaller. classes: optional number of classes to classify images into. # Returns A Keras model instance. </code></pre> <p>"""</p> <pre><code>input_shape = _obtain_input_shape(input_shape, default_size=224, min_size=96, data_format=K.image_data_format(), require_flatten=True) if input_tensor is None: img_input = Input(shape=input_shape) else: if not K.is_keras_tensor(input_tensor): img_input = Input(tensor=input_tensor, shape=input_shape) else: img_input = input_tensor x = Convolution2D(int(32 * alpha), (3, 3), strides=(2, 2), padding='same', use_bias=False)(img_input) x = BatchNormalization()(x) x = Activation('relu')(x) x = DepthwiseConvolution2D(int(32 * alpha), (3, 3), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(64 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = DepthwiseConvolution2D(int(64 * alpha), (3, 3), strides=(2, 2), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(128 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = DepthwiseConvolution2D(int(128 * alpha), (3, 3), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(128 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = DepthwiseConvolution2D(int(128 * alpha), (3, 3), strides=(2, 2), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(256 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = DepthwiseConvolution2D(int(256 * alpha), (3, 3), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(256 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = DepthwiseConvolution2D(int(256 * alpha), (3, 3), strides=(2, 2), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(512 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) if not shallow: for _ in range(5): x = DepthwiseConvolution2D(int(512 * alpha), (3, 3), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(512 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = DepthwiseConvolution2D(int(512 * alpha), (3, 3), strides=(2, 2), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(1024 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = DepthwiseConvolution2D(int(1024 * alpha), (3, 3), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = Convolution2D(int(1024 * alpha), (1, 1), strides=(1, 1), padding='same', use_bias=False)(x) x = BatchNormalization()(x) x = Activation('relu')(x) x = GlobalAveragePooling2D()(x) out = Dense(classes, activation='softmax')(x) if input_tensor is not None: inputs = get_source_inputs(input_tensor) else: inputs = img_input model = Model(inputs, out, name='mobilenet') return model </code></pre> <p>if <strong>name</strong> == '<strong>main</strong>': m = MobileNet(alpha=0.5) plot_model(m,'modela=0.5.png',show_shapes=True) print ("model ready")</p> <pre><code> ### 5.2 Keras 实现 MobileNet V2 基于本文给出的参数,使用KERAS的网络结构如下:[en]<u>Based on the parameters given in this paper, the network structure using Keras is as follows:</u> ;gutter:true;
from keras.models import Model
from keras.layers import Input, Conv2D, GlobalAveragePooling2D, Dropout
from keras.layers import Activation, BatchNormalization, add, Reshape
from keras.applications.mobilenet import relu6, DepthwiseConv2D
from keras.utils.vis_utils import plot_model

from keras import backend as K

def _conv_block(inputs, filters, kernel, strides):
"""Convolution Block
This function defines a 2D convolution operation with BN and relu6.

# Arguments
inputs: Tensor, input tensor of conv layer.

filters: Integer, the dimensionality of the output space.

kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.

strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution along the width and height.

Can be a single integer to specify the same value for
all spatial dimensions.

# Returns
Output tensor.

"""

channel_axis = 1 if K.image_data_format() == ‘channels_first’ else -1

x = Conv2D(filters, kernel, padding=’same’, strides=strides)(inputs)
x = BatchNormalization(axis=channel_axis)(x)
return Activation(relu6)(x)

def _bottleneck(inputs, filters, kernel, t, s, r=False):
"""Bottleneck
This function defines a basic bottleneck structure.

# Arguments
inputs: Tensor, input tensor of conv layer.

filters: Integer, the dimensionality of the output space.

kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.

t: Integer, expansion factor.

t is always applied to the input size.

s: An integer or tuple/list of 2 integers,specifying the strides
of the convolution along the width and height.Can be a single
integer to specify the same value for all spatial dimensions.

r: Boolean, Whether to use the residuals.

# Returns
Output tensor.

"""

channel_axis = 1 if K.image_data_format() == ‘channels_first’ else -1
tchannel = K.int_shape(inputs)[channel_axis] * t

x = _conv_block(inputs, tchannel, (1, 1), (1, 1))

x = DepthwiseConv2D(kernel, strides=(s, s), depth_multiplier=1, padding=’same’)(x)
x = BatchNormalization(axis=channel_axis)(x)
x = Activation(relu6)(x)

x = Conv2D(filters, (1, 1), strides=(1, 1), padding=’same’)(x)
x = BatchNormalization(axis=channel_axis)(x)

if r:
x = add([x, inputs])
return x

def _inverted_residual_block(inputs, filters, kernel, t, strides, n):
"""Inverted Residual Block
This function defines a sequence of 1 or more identical layers.

# Arguments
inputs: Tensor, input tensor of conv layer.

filters: Integer, the dimensionality of the output space.

kernel: An integer or tuple/list of 2 integers, specifying the
width and height of the 2D convolution window.

t: Integer, expansion factor.

t is always applied to the input size.

s: An integer or tuple/list of 2 integers,specifying the strides
of the convolution along the width and height.Can be a single
integer to specify the same value for all spatial dimensions.

n: Integer, layer repeat times.

# Returns
Output tensor.

"""

x = _bottleneck(inputs, filters, kernel, t, strides)

for i in range(1, n):
x = _bottleneck(x, filters, kernel, t, 1, True)

return x

def MobileNetv2(input_shape, k):
"""MobileNetv2
This function defines a MobileNetv2 architectures.

# Arguments
input_shape: An integer or tuple/list of 3 integers, shape
of input tensor.

k: Integer, layer repeat times.

# Returns
MobileNetv2 model.

"""

inputs = Input(shape=input_shape)
x = _conv_block(inputs, 32, (3, 3), strides=(2, 2))

x = _inverted_residual_block(x, 16, (3, 3), t=1, strides=1, n=1)
x = _inverted_residual_block(x, 24, (3, 3), t=6, strides=2, n=2)
x = _inverted_residual_block(x, 32, (3, 3), t=6, strides=2, n=3)
x = _inverted_residual_block(x, 64, (3, 3), t=6, strides=2, n=4)
x = _inverted_residual_block(x, 96, (3, 3), t=6, strides=1, n=3)
x = _inverted_residual_block(x, 160, (3, 3), t=6, strides=2, n=3)
x = _inverted_residual_block(x, 320, (3, 3), t=6, strides=1, n=1)

x = _conv_block(x, 1280, (1, 1), strides=(1, 1))
x = GlobalAveragePooling2D()(x)
x = Reshape((1, 1, 1280))(x)
x = Dropout(0.3, name=’Dropout’)(x)
x = Conv2D(k, (1, 1), padding=’same’)(x)

x = Activation(‘softmax’, name=’softmax’)(x)
output = Reshape((k,))(x)

model = Model(inputs, output)
plot_model(model, to_file=’images/MobileNetv2.png’, show_shapes=True)

return model

if __name__ == ‘__main__’:
MobileNetv2((224, 224, 3), 1000)

PaperInverted Residuals and Linear Bottlenecks Mobile Networks for Classification, Detection and Segmentation
Githubhttps://github.com/xiaochus/MobileNetV2
https://www.jianshu.com/p/1cf3b543afff?utm_source=oschina-app

https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet https://github.com/xiaochus/MobileNetV2/blob/master/data/convert.py

MobileNet V1官方预训练模型的使用:https://www.jianshu.com/p/fe0c1b10720b

https://blog.csdn.net/u011974639/article/details/79199306

https://www.jianshu.com/p/7f77faf1776d

https://zhuanlan.zhihu.com/p/58554116

完整实现可以参见GitHub(https://github.com/xiaohu2015/DeepLearning_tutorials/)

https://github.com/Hedlen/Mobilenet-Keras/blob/master/model/mobilenet.py

https://www.jianshu.com/p/1cf3b543afff?utm_source=oschina-app

https://www.xianjichina.com/special/detail_433028.html

最后,给出了咖啡因模型的三个版本:[en]Finally, three versions of the caffe model are given:

mobilenet v1:https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_deploy.prototxt

mobilenet v2:https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prototxt

mobilenet v3:https://github.com/jixing0415/caffe-mobilenet-v3

Original: https://www.cnblogs.com/wj-1314/p/10494911.html
Author: 战争热诚
Title: 卷积神经网络学习笔记??????轻量化网络MobileNet系列(V1,V2,V3)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/5949/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部
最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总