# 深度学习-conv卷积

## 卷积

[\begin{align} (fg)(t) &={\int}_{-\infty}^{\infty}f(\tau)g(t-\tau) \tag{连续形式} \ (fg)(t) &={\sum}_{\tau=-\infty}^{\infty}f(\tau)g(t-\tau) \tag{离散形式} \end{align} ]

[En]

Suppose there are two dice, and the two dice are (f) and (g), respectively. (f (1)) represents the probability that the upward side of the dice is the number 1. If you throw these two dice at the same time, what is the probability that they will face up and have a sum of 4? It is easy to think of three situations, namely:

• (f) 向上为1，(g) 向上为3；
• (f) 向上为2，(g) 向上为2；
• (f) 向上为3，(g) 向上为1；

[En]

You can see that convolution is actually turning over the input volume of one function and finding the inner product with another function.

## 图像滤波器

[En]

The traditional image filter operators are as follows:

• blur kernel：减少相邻像素的差异，使图像变平滑。
• sobel：显示相邻元素在特定方向上的差异。
• sharpen ：强化相邻像素的差异，使图片看起来更生动。
• outline：也称为edge kernel，相邻像素相似亮度的像素点设成黑，有较大差异的设为白。

## CNN 卷积层

CNN做的事情不是提前决定好过滤器，而是把过滤器当成参数不断调整学习，学出合适的过滤器。卷积网络的第一层执行的通常都是边缘检测，后面的层得到的都是更抽象的特征。CNN 的卷积核可以看作是局部区域内的全连接. 卷积层的一个重要特性是权值共享.

### 小卷积核

[En]

At present, most popular network structure designs follow the design principles of small convolution kernels. * the advantages of small convolution kernels * :

3个3×3的卷积核的累加相当于1个7×7的卷积核,但是参数更少,计算量更小,有更多的非线性层计算.还可以通过加入1×1的”bottleneck”卷积核进一步减少计算(GoogLeNet等大量运用这种方式加深层次).如输入 HxWxC经过下列步骤输出的维数不变:

[\require{AMScd} \begin{CD} H\times W\times C @>{\text{Conv 1×1 x C/2}}>> H\times W\times C/2 \ @. @V {\text{Conv 3×3 x C/2}} VV \ H\times W\times C @< \text{Conv 1×1 x C} << H\times W\times C/2 \end{CD} ]

[En]

However, the 3×3 convolution kernel is still used in the above steps, which can be converted into a connection between 1×3 and 3×1.

[En]

Input and output scale relationship:

[o=\lfloor{i+2p-k\over s}\rfloor+1 ]

### 孔洞卷积/膨胀卷积

deeplab 论文 Rethinking Atrous Convolution for Semantic Image Segmentation 中提到 “Atrous convolution, also known as dilated convolution”.

Dilated/Atrous Convolution 或者是 Convolution with holes 从字面上就很好理解，是在标准的 convolution 中注入空洞，以此来增加感受野。相比没有孔洞的卷积，dilated convolution 多了一个 dilation rate 超参数, 指的是 kernel 的间隙(无孔洞的卷积 dilation rate 是 1)。

[En]

Hole convolution input-output scale relationship:

[o=\lfloor{i+2p-[(k-1)d+1] \over s}\rfloor+1 ]

d是 dilation rate, 在kernel中加入了d-1个孔,相当于扩大了卷积核的覆盖面积,k’=k+(k-1)(d-1).

atrous conv 能够使用原来的参数初始化是因为 atrous conv 的神经元连接和原来的仍然一样,如下图所示[3], 红色的输出神经元对应的输入(第一层)的神经元在(a)(b)中相同:

## 卷积实现

[En]

There are three main ways to calculate convolution: transform to matrix multiplication, winograd, and FFT.

[En]

The following is a brief introduction to the fast Fourier transform.

### 卷积定理(convolution theorem) [6] [7]

[En]

Fast Fourier transform is regarded as one of the most important algorithms in the 20 th century, and one factor is convolution theorem.

[En]

Fourier transform can be regarded as the reorganization of image or audio data, which corresponds the complex convolution in time domain and space domain to the simple product of elements in frequency domain.

[En]

The convolution of two continuous functions over an one-dimensional continuous field:

[h(x)=f\bigotimes g=\int_{-\infty}^\infty f(x-u)g(u)du=\mathcal F^{-1}(\sqrt{2\pi}\mathcal F[f]\mathcal F[g]) ]

[En]

From the convolution theorem, we can know that the result of the convolution of two matrices is equivalent to the Fourier transform ( ( mathcal F)), the element-level multiplication of the two matrices, and then the inverse Fourier transform ( ( mathcal F ^ {- 1})). ( sqrt {2 pi}) is a normalizer.

[En]

Convolution on two-dimensional discrete domain (image):

[\begin{align} \text{feature map}=&\text{intput}\bigotimes\text{kernel} \ =&\sum_{y=0}^M \sum_{x=0}^N \text{intput}(x-a,y-b)\cdot \text{kernel}(x,y) \ =&\mathcal F^{-1}(\sqrt{2\pi}\mathcal F[\text{intput}]\mathcal F[\text{kernel}]) \end{align} ]

[En]

Fast Fourier transform (FFT) is an algorithm that converts data in time domain and spatial domain to frequency domain. The Fourier transform uses the sum of some sine and cosine waves to represent the original function. It must be noted that Fourier transform generally involves complex numbers, that is, a real number is transformed into a complex number with real and imaginary parts. Usually the imaginary part is only useful in some fields, such as transforming the frequency domain back to time domain and space domain.

[En]

The direction information can be seen from the Fourier transform:

Images by Fisher & Koryllos (1998). Source

### Caffe 框架中的卷积实现

[En]

Efficient calculation of converting to two-dimensional matrix multiplication:

Im2ol的更详细的插图：

[En]

A more detailed illustration of im2col:

### 卷积-反向传播

[\begin{align} {\partial L\over \partial W} &=X\odot{\partial L\over \partial Y} \ {\partial L\over \partial X} &=\rm{Full Conv}(180^\circ\text{rotated W},{\partial L\over \partial Y}) \ {\partial L\over \partial X’} &={\partial L\over \partial Y}\odot W’ \ {\partial L\over \partial X} &=\text{col2im}({\partial L\over \partial X’}) \end{align} ]

Original: https://www.cnblogs.com/makefile/p/conv.html
Author: 康行天下
Title: 深度学习-conv卷积

(0)