深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

论文标题:MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

论文作者:Andrew G. Howard,Menglong Zhu,Bo Chen,Dmitry Kalenichenko,Weijun Wang,Tobias Weyand,Marco Andreetto,Hartwig Adam

论文地址:https://arxiv.org/abs/1704.04861.pdf

参考的 MobileNets 翻译博客:https://blog.csdn.net/qq_31531635/article/details/80508306

免责声明:编辑翻译的论文仅供学习之用,如有侵权行为,请联系编辑删除博客帖子,谢谢![en]Disclaimer: the editor translates the paper for study only, if there is any infringement, please contact the editor to delete the blog post, thank you!

小编是机器学习的初学者,有意仔细研究论文,但英语水平有限,所以在论文的翻译中使用了Google,并逐句检查,但仍有一些晦涩的地方,如语法/专业名词翻译错误,请原谅,并欢迎及时指出。[en]The editor is a beginner in machine learning and intends to study the paper carefully, but his English level is limited, so Google is used in the translation of the paper and checked sentence by sentence, but there are still some obscure places, such as grammatical / professional noun translation errors, please forgive me, and welcome to point out in time.

如果需要小编其他论文翻译,请移步小编的GitHub地址

传送门:请点击我

如果点击有误:https://github.com/LeBron-Jian/DeepLearningNote

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

摘要

针对移动和嵌入式视觉应用,我们提出了一个有效的模型MobileNets。MobileNets基于流线型结构,使用深度可分离卷积来构建加权深度神经网络。我们引入了两个简单的全局超参数,它们可以有效地权衡延迟和精度。这些超参数允许模型构建器根据特定问题选择合适大小的模型。我们在资源和准确率之间做了大量的权衡实验,在ImageNet分类任务中表现优于其他著名的模型。然后,我们展示了移动网络在广泛的应用中的有效性。实验实例包括目标检测、细粒度分类、人脸属性和大规模地理位置信息。[en]We propose an effective model called MobileNets for mobile and embedded vision applications. MobileNets is based on a streamlined structure and uses depth separable convolution to construct weighted depth neural networks. We introduce two simple global hyperparameters that can effectively weigh delay and accuracy. These hyperparameters allow the model builder to choose the right size model according to a specific problem. We have done a lot of experiments on the tradeoff between resources and accuracy and perform better than other famous models in ImageNet classification tasks. Then, we demonstrate the effectiveness of MobileNets in a wide range of applications. Experimental examples include target detection, fine-grained classification, face attributes and large-scale geographic location information.

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

1,引言

自从著名的深度卷积神经网络AlexNet赢得了ImageNet竞赛:ILSVRC 2012以来,卷积神经网络在计算机视觉领域得到了广泛的应用。为了达到更高的精确度,大势所趋是让网络更深入、更复杂。然而,这些精确度的提高并不一定会使网络在规模和速度方面更加高效。在机器人、自动驾驶和增强现实等大多数现实应用中,识别任务需要在有限的计算平台上实时执行。[en]Since the famous deep convolution neural network AlexNet won the ImageNet competition: ILSVRC 2012, convolution neural network has been widely used in the field of computer vision. In order to achieve higher accuracy, the general trend is to make the network deeper and more complex. However, these improvements in accuracy do not necessarily make the network more efficient in terms of size and speed. In most real-world applications, such as robots, self-driving and augmented reality, recognition tasks need to be implemented in real time on limited computing platforms.

本文描述了一种有效的网络结构和两组超参数来构建一个小的、低延迟的模型,该模型可以很容易地满足移动和嵌入式视觉应用的设计要求。第二部分回顾了现有的建立小模型的工作。第三部分介绍了MobileNet的结构和两种超参数-宽度乘数(宽度乘数)和分辨率乘数(分辨率乘数),以定义一个更小、更高效的移动网络。第四部分介绍了在ImageNet上的实验以及大量不同的应用场景和实例。第五部分为小结和结论。[en]This paper describes an effective network structure and two sets of hyperparameters for building a small, low-latency model, which can easily match design requirements in mobile and embedded vision applications. The existing work on building small models is reviewed in the second section. The third section describes the structure of MobileNet and two kinds of hyperparameters-width multiplier (width multiplier) and resolution multiplier (resolution multiplier) to define a smaller and more efficient MobileNets. The fourth section describes the experiments on ImageNet and a large number of different application scenarios and examples. The fifth section ends with a summary and conclusion.

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

2,现有工作

近期已经有一些构造小而有效的神经网络的文献,如SqueezeNet、Flattened convolutional neural networks for feedforward acceleration、Imagenet classification using binary convolutional neural networks、Factorized convolutional neural networks、Quantized convolutional neural networks for mobile devices。这些方法可以大概分为要么是压缩预训练网络,要是直接训练小型网络。本文提出一类神经网络结构允许特定模型开发人员对于其应用上可以选择一个小型网络能匹配已有限制性的资源(延迟,尺寸)。MobileNets 首先聚焦于优化延迟,但也产生小型网络,许多文献在小型网络上只聚焦尺度但是没有考虑过速度问题。

MobileNets首先用于深度可分离卷积(Rigid-motion scattering for image classification中首先被提出)进行构建,随后被用在Inception结构中(GoogLeNetv2)来减少首先几层的计算量。Flattened Networks 构建网络运用完全分解的卷积并证明了极大分解网络的重要性。而 Factorized Networks介绍了一个相似的分解卷积和拓扑连接的视野。随后,Xception Network描述了如何放大深度可分离滤波器来完成 InceptionV3网络。另一个小型网络是 SqueezeNet,使用 bottleneck 的方法来设计一个小型网络。

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

3,MobileNet 结构

本部分首先介绍了MobileNet的核心部分,即深可分卷积。然后描述了MobileNet的网络结构和两个模型收缩超参数:宽度乘子和分辨率乘子。[en]This section first describes the core part of MobileNet, namely * deep separable convolution * . Then the network structure of MobileNet and two model contraction superparameters, * width multiplier and * resolution multiplier, are described.

3.1 深度可分离卷积

MobileNet 是一个基于深度可分离卷积的模型,深度可分离卷积是一种将标准卷积分解成深度卷积以及一个 11 的卷积即逐点卷积。对于Mobilenet 而言,深度卷积针对每个单个输入通道应用单个滤波器进行滤波,然后逐点卷积应用 11 的卷积操作来结合所有深度卷积得到的输出。而标准卷积一步即对所有的输入进行结合得到新的一系列输出。深度可分离卷积将其分为两个部分,针对每个单独层进行滤波然后下一步即结合,这种分解能够有效地大量减少计算量以及模型的大小。如图2所示,一个标准的卷积2(a)被分解成深度卷积2(b)和1*1的逐点卷积2(c)。

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

标准卷积层输入dfdfM的特征图F并获得dGdGN的输出特征图G,其中df表示输入特征图的宽度和高度,M是输入通道的数目(输入深度),dG是输出特征图的宽度和高度,N是输出通道的数目(输出深度)。[en]A standard convolution layer inputs the characteristic graph F of DFDFM and obtains an output characteristic graph G of DGDGN, where DF represents the width and height of the input feature graph, M is the number of input channels (input depth), DG is the width and height of the output feature graph, and N is the number of output channels (output depth).

标准卷积层将K个参数从大小传递到DKDKMN卷积核,其中DK是卷积核的空间维度,M是输入通道数,N是输出通道数。[en]The standard convolution layer passes K parameters from size to DKDKMN convolution kernel, where DK is the space dimension of the convolution kernel, M is the number of input channels, and N is the number of output channels.

对于标准卷积输出的卷积图,假设步长为1,则按以下公式计算填充:[en]For the convolution diagram of the output of standard convolution, assuming that the step size is 1, the padding is calculated by the following formula:

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

计算的金额如下:[en]The amount of calculation is as follows:

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

它由输入通道数M、输出通道数N、卷积核大小DK和输出特征图Df的大小决定。在此基础上对移动网络模型进行了改进。首先,采用深度可分离卷积,打破了输出通道数与卷积核大小之间的联系。[en]It is determined by the number of input channels M, the number of output channels N, the size of convolution kernel DK and the size of output characteristic graph DF. The MobileNet model is improved according to it. First of all, depth separable convolution is used to break the connection between the number of output channels and the size of the convolution kernel.

标准卷积运算产生基于卷积核和组合特征的新表示,以产生对特征进行过滤的效果。通过分解卷积运算,可以将滤波和合并分成两个独立的部分,称为深度可分离卷积,大大降低了计算量。深度可分离卷积由两层组成:深度卷积和逐点卷积。我们使用深度卷积将每个输入通道与单个卷积核进行卷积,得到输入通道数目的深度,然后使用逐点卷积,即简单的1-1卷积,在深度卷积中线性组合输出。MobileNets对每一层使用批处理范数和RELU非线性激活。[en]Standard convolution operations produce a new representation based on convolution kernels and combined features to produce an effect on filtering features. Filtering and combination can be divided into two independent parts by decomposition convolution operation, which is called depth separable convolution, which can greatly reduce the computational cost. Depth separable convolution consists of two layers: depth convolution and point-by-point convolution. We use depth convolution to convolution each input channel with a single convolution kernel to get the depth of the number of input channels, and then use point-by-point convolution, that is, a simple 1-1 convolution, to linearly combine the output in depth convolution. MobileNets uses batchnorm and ReLU nonlinear activation for each layer.

深度卷积使用每个通道的卷积核心,其可写为:[en]Deep convolution uses a convolution core for each channel, which can be written as:

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

其中Khat是深卷积核的大小,DkDkM中的m卷积核,Khat被应用于F中的m个通道,以生成m个通道的卷积输出特性图Ghat。[en]Where Khat is the size of the deep convolution kernel, the m convolution kernel in the DKDKM,Khat is applied to the m channel in F to generate the convolution output characteristic graph Ghat of the m channel.

深度卷积的计算量为:[en]The amount of computation for depth convolution is:

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度卷积相对于标准卷积是非常有效的,但它只卷积输入通道,而不结合它来产生新的特征。因此,下一层使用另一层来使用1-1卷积来计算输出的线性组合的深度卷积来产生新的特征。[en]Depth convolution is very effective relative to standard convolution, but it only convolutes the input channel and does not combine it to produce new features. So the next layer uses the other layer to use 1-1 convolution to calculate a linear combination of the output of depth convolution to produce new features.

那么深度卷积加上 1*1 卷积的逐点卷积的结合就叫做深度可分离卷积,最开始在(Rigid-motition scattering for image classification)中被提出。

深可分卷积的计算量为:[en]The amount of computation for deep separable convolution is:

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

也就是说,深度卷积和逐点卷积的1:1之和。[en]That is, the sum of the depth convolution and the point-by-point convolution of 1: 1.

通过将体积积分到滤波和合并中,减少了计算量。[en]The amount of calculation is reduced by integrating the volume into filtering and combination.

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

另外的空间维数的分解方式如(Flattenedconvolutional neural networks for feedforward acceleration)(Rethinking the inception architecture for computer vision.)中。但是相较于深度可分离卷积,计算量的减少也没有这么多。

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

3.2 网络结构和训练

MobileNet 结构就像前面所提到的由深度可分离卷积所构成,且除了第一层之外为全卷积。通过用这些简单的项定义网络能够更容易的探索网络的拓扑结构来找到一个更好的网络。MobileNet结构由下表1定义。所有层跟着一个Batchnorm以及ReLU非线性激活函数,除了最后一层全连接层没有非线性激活函数直接送入softmax层进行分类。下图3比较了常规的卷积,Batchnorm,ReLU层以及分解层包含深度可分离卷积,1*1卷积,以及在每层卷积层之后的 Batchnorm 和 ReLU 非线性激活函数。下采样通过深度可分离卷积中第一层的深度卷积通过步长来进行控制,最后将卷积层中提取到的特征图经过全局平均池化层降维到1维,然后送入全连接层分成 1000类。将深度卷积和逐点卷积做两层,则MobileNet 含有 28层。

仅用这几个乘法和加法运算来定义一个简单的网络是不够的。确保这些行动非常有效也很重要。除非非结构化稀疏矩阵运算具有非常高的稀疏度,否则实例化非结构化稀疏矩阵运算不一定比密集矩阵运算快。该模型的结构几乎把左、右运算都放在密集的1-1卷积中,这可以通过矩阵乘法函数用高度优化的GEMM来实现。卷积通常由GEMM(通用矩阵乘法函数)实现,但需要一个im2ol来初始化内存中的重新排序以映射到GEMM。例如,此方法用于Caffe模型体系结构。另一方面,1-1卷积不需要存储重排序,可以直接用GEMM方法实现,是优化程度最高的数值线性代数算法之一。如表2所示,MobileNet将95%的计算时间用于1:1逐点卷积,占参数数量的75%,其他几乎所有的附加参数都集中在全连接层。[en]It is not enough to define a simple network with these few multiplication and addition operations. It is also important to ensure that these operations are very effective. Instantiating an unstructured sparse matrix operation is not necessarily faster than a dense matrix operation unless it has a very high sparse degree. The structure of our model almost puts the left and right computation in the dense 1-1 convolution, which can be realized by highly optimized GEMM through matrix multiplication function. Convolution is usually implemented by GEMM (General Matrix Multiply Functions’s way), but requires an im2col to initialize reordering in memory to map to GEMM. For example, this method is used in the caffe model architecture. On the other hand, 1-1 convolution does not need memory reordering and can be directly implemented by GEMM method, so it is one of the most optimized numerical linear algebraic algorithms. MobileNet spends 95% of its computing time on 1: 1 point-by-point convolution and accounts for 75% of the number of parameters, as shown in Table 2. Almost all the other additional parameters are concentrated in the full connection layer.

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

MobileNet 模型在 TensorFlow框架中使用与 InceptionV3中一样的 RMSprop 异步梯度下降算法,然而,与训练大型网络不同的是,我们使用了非常少的正则化以及数据增强技术,因为小模型很少有过拟合的问题。当训练 MobileNet时,我们没有使用 side heads 或者标签平滑操作,另外通过限制在大型 Inception 层训练中小的裁剪的大小来减少失真图片的数量。另外,我们发现在深度卷积中尽量不加入权重衰减(L2范数)是非常重要的,因为深度卷积中参数量很小。对于ImageNet 数据集,无论模型大小,所有的模型都被相同的超参数训练模型,下一节来说明。

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

3.3 宽度乘法器:更薄的模型

虽然最基本的MobileNet结构已经很小,延迟也很低。在许多情况下,特定的案例或应用程序可能要求模型变得更小、更快。为了建立这些更小、计算量更小的模型,我们引入了一个非常简单的参数α,称为宽度乘数。宽度倍增α的功能是使每一层均匀变薄。给定层和宽度乘数α,输入通道的数量M变为αM,输出通道的数量变为αN。[en]Although the most basic MobileNet structure is already very small and low latency. In many cases, specific cases or applications may require the model to become smaller and faster. In order to build these smaller and less computational models, we introduce a very simple parameter α called width multiplier. The function of width multiplier α is to evenly thin each layer. Given a layer and width multiplier α, the number of input channels M becomes α M and the number of output channels becomes α N.

加上宽度乘数的深度可分卷积,计算量如下:[en]Plus the depth separable convolution of the width multiplier, the amount of computation is as follows:

Original: https://www.cnblogs.com/wj-1314/p/14318311.html
Author: 战争热诚
Title: 深度学习论文翻译解析-MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/5961/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

最近整理资源【免费获取】:   👉 程序员最新必读书单  | 👏 互联网各方向面试题下载 | ✌️计算机核心资源汇总