# NIN (Network In Network)

## Network In Network

[En]

The linear filter used in traditional CNN is a generalized linear model (Generalized linear model,GLM). Therefore, when using CNN for feature extraction, it is implicitly assumed that features are linearly separable, but practical problems are often difficult to be linearly separable. CNN generates higher-level feature representation by adding convolution filters. The author thinks that in addition to adding the network convolution layer as before, we can also make a special design in the convolution layer, so that the network can extract better features in each receptive domain.

### mlpconv

maxout能够拟合任何凸函数，也就能够拟合任何的激活函数（默认了激活函数都是凸的）,而NIN想表明它不仅能够拟合任何凸函数，而且能够拟合任何函数，因为它本质上可以说是一个小型的全连接神经网络.

NIN使用多层感知器的原因是MLP的结构与CNN兼容,都可以使用反向传播训练,并且也是个深度模型,与特征重用的理念一致.将MLP构成的网络层称为一个mlpconv层. MLP可以拟合任意形式的函数，线性、非线性的都可以.

[En]

The difference between the linear convolution layer and the mlpconvolution layer is shown in the figure:

mlpconv中使用ReLU,并未替换掉激活函数,改变的只是卷积的方式：不再是 element-wise形式的乘积，而是用非线性的 MLP + ReLU完成。其目的是引入更多的非线性元素。

[En]

The NIN structure of the following figure:

[En]

The first convolution kernel is 11x11x3x96, so the output of convolution on a patch block is the feature map of 1x1x96 (a 96-dimensional vector). After that, another MLP layer is connected, and the output is still 96. 5%. Therefore, this MLP layer is equivalent to a 1 x 1 convolution layer, so that the engineering implementation still follows the previous way, and no extra work is required.

### Global Average Pooling

[En]

The traditional cnn uses convolution at the lower level, such as the classification task, in which the feature map obtained from the final convolution layer is vectorized to the full connection layer, and then classified by softmax regression. Generally speaking, the convolution completed at the end of the convolution is bridged with the traditional classifier. The full connection stage is easy to over-fit, which hinders the generalization ability of the whole network. Generally, there should be some rule methods to deal with over-fitting.

[En]

In traditional CNN, it is difficult to explain how the error of the category information output of the last full connection layer is transmitted to the previous convolution layer. Global average pooling is easier to explain. In addition, the fully connected layer is easy to over-fit and often depends on regularization methods such as dropout.

global average pooling的概念非常简单,分类任务有多少个类别,就控制最终产生多少个feature map.对每个feature map的数值求平均作为某类别的置信度,类似FC层输出的特征向量,再经过softmax分类.其优点有:

1. 参数数量减少,减轻过拟合(应用于AlexNet,模型230MB->29MB);
2. 更符合卷积网络的结构,使feature map和类别信息直接映射;
3. 求和取平均操作综合了空间信息,使得对输入的空间变换更鲁棒(与卷积层相连的FC按顺序对特征进行了重新编排(flatten),可能破坏了特征的位置信息).

4. FC层输入的大小须固定,这限制了网络输入的图像大小.

FC与global average pooling的区别如下图:

[En]

It can be used for image classification, target detection and other tasks.

global average pooling实现使用Average Pooling,kernel_size是特征图的大小. caffe prototxt定义如下:

layers {
bottom: "cccp8"
top: "pool4"
name: "pool4"
type: POOLING
pooling_param {
pool: AVE
#kernel_size: 6
#stride: 1
#--&#x65E7;&#x7248;caffe&#x9700;&#x6307;&#x5B9A;kernel_size&stride--
global_pooling: true
}
}


caffe在该论文之后加入了对global_pooling的支持,在pooling_param中指定 global_pooling: true即可,不需要指定kernel大小,pad和stride大小(pad = 0 , stride = 1,否则会报错).kernel_size自动使用特征图的大小,代码:

if (global_pooling_) {
kernel_h_ = bottom[0]->height();
kernel_w_ = bottom[0]->width();
}


Original: https://www.cnblogs.com/makefile/p/nin.html
Author: 康行天下
Title: NIN (Network In Network)

(0)