Resnet概述及代码

ResNet

网络由来

《用于图像识别的深度残差学习》(Deep Residual Learning for Image Recognition)[1]。这篇论文获得了 CVPR 2016 的最佳论文,在发表之后的两年间里获得了超过 1 万 2 千次的论文引用。

6.2 论文的主要贡献

我们前面介绍 VGG 和 GoogleNet 的时候就已经提到过,在深度学习模型的前进道路上,一个重要的研究课题就是神经网络结构究竟能够搭建多深。

这个话题应该从两个方面来看待:第一是实践层面,即如何构建更深的网络,如何训练更深的网络,如何展示更深的网络的更好性能;第二是理论层面,即如何将网络的深度或层次、网络的宽度与模型的整体泛化性能直接联系起来。

[En]

This topic should be viewed from two aspects: the first is the practical level, that is, how to build a deeper network, how to train a deeper network, and how to show the better performance of a deeper network; the second is the theoretical level, that is, how to directly connect the depth or hierarchy of the network, and the width of the network with the generalization performance of the model as a whole.

长期以来,研究人员对神经网络结构做出了大胆的预测,即网络结构越深,泛化能力越好。但要取得这样的成绩并非易事。我们将面临哪些挑战?

[En]

For a long time, researchers have made a bold prediction of the neural network structure, that is, a deeper network architecture can bring better generalization ability. But it is not easy to achieve such a result. What challenges will we encounter?

一个长期的挑战就是模型训练时的梯度”爆炸”(Exploding)或者”消失”(Vanishing)。为了解决这个问题,在深度学习研究刚刚开始的一段时间,就如雨后春笋般爆发出了很多技术手段,比如”线性整流函数”(ReLu),”批量归一化”(Batch Normalization),”预先训练”(Pre-Training)等等。

另外一个挑战是在 VGG 和 GoogleNet 的创新之后,大家慢慢发现单纯加入更多的网络层次其实并不能带来性能的提升。研究人员有这样一个发现:当一个模型加入到 50 多层后,模型的性能不但没有提升,反而还有下降,也就是模型的准确度变差了。这样看,好像模型的性能到了一个”瓶颈”。那是不是说深度模型的深度其实是有一个限度的呢?

我们从 GoogleNet 的思路可以看出,网络结构是可以加深的,比如对网络结构的局部进行创新。而这篇论文,就是追随 GoogleNet 的方法,在网络结构上提出了一个新的结构,叫”残差网络”(Residual Network),简称为 ResNet,从而能够把模型的规模从几层、十几层或者几十层一直推到了上百层的结构。这就是这篇文章的最大贡献。

从模型在实际数据集中的表现效果来看,ResNet 的错误率只有 VGG 和 GoogleNet 的一半,模型的泛化能力随着层数的增多而逐渐增加。这其实是一件非常值得深度学习学者振奋的事情,因为它意味着深度学习解决了一个重要问题,突破了一个瓶颈。

6.3 论文的核心方法

那么,本文的核心理念是什么呢?让我们来看看。

[En]

So what is the core idea of this paper? Let’s take a look.

我们先假设有一个隐含的基于输入 x 的函数 H。这个函数可以根据 x 来进行复杂的变换,比如多层的神经网络。然而,在实际中,我们并不知道这个 H 到底是什么样的。那么,传统的解决方式就是我们需要一个函数 F 去逼近 H。

而这篇文章提出的”残差学习”的方式,就是不用 F 去逼近 H,而是去逼近 H(x) 减去 x 的差值。在机器学习中,我们就把这个差值叫作”残差”,也就是表明目标函数和输入之间的差距。当然,我们依然无法知道函数 H,在实际中,我们是用 F 去进行残差逼近。

F(x)=H(x)-x,当我们把 x 移动到 F 的一边,这个时候就得到了残差学习的最终形式,也就是 F(x)+x 去逼近未知的 H。

Resnet概述及代码
在这个公式里,外面的这个 x 往往也被称作是”捷径”(Shortcuts)。什么意思呢?有学者发现,在一个深度神经网络结构中,有一些连接或者说层与层之间的关联其实是不必要的。我们关注的是,什么样的输入就应当映射到什么样的输出,也就是所谓的”等值映射”(Identity Mapping)。

遗憾的是,如果不对网络结构进行改进,模型无法学习到这些结构。那么,构建一个从输入到输出的捷径,也就是说,从 x 可以直接到 H(或者叫 y),而不用经过 F(x),在必要的时候可以强迫 F(x) 变 0。也就是说,捷径或者是残差这样的网络架构,在理论上可以帮助整个网络变得更加有效率,我们希望算法能够找到哪些部分是可以被忽略掉的,哪些部分需要保留下来。

在真实的网络架构中,作者们选择了在每两层卷积网络层之间就加入一个捷径,然后叠加了 34 层这样的架构。从效果上看,在 34 层的时候 ResNet 的确依然能够降低训练错误率。于是,作者们进一步尝试了 50 多层,再到 110 层,一直到 1202 层的网络。最终发现,在 110 层的时候能够达到最优的结果。而对于这样的网络,所有的参数达到了 170 万个。

为了训练 ResNet,作者们依然使用了批量归一化以及一系列初始化的技巧。值得一提的是,到了这个阶段之后,作者们就放弃了 Dropout,不再使用了。

; 6.4 里程碑:ResNet

VGGNet与Inception出现后,学者们将卷积网络不断加深以寻求更优越的性能,然而随着网络的加深,网络却越发难以训练,一方面会产生梯度消失现象;另一方面越深的网络返回的梯度相关性会越来越差,接近于白噪声,导致梯度更新也接近于随机扰动。

ResNet(Residual Network,残差网络)较好地解决了这个问题,并获得了2015年ImageNet分类任务的第一名。此后的分类、检测、分割等任务也大规模使用ResNet作为网络骨架。

ResNet的思想在于引入了一个深度残差框架来解决梯度消失问题,即让卷积网络去学习残差映射,而不是期望每一个堆叠层的网络都完整地拟合潜在的映射(拟合函数)。如图3.17所示,对于神经网络,如果我们期望的网络最终映射为H(x),左侧的网络需要直接拟合输出H(x),而右侧由ResNet提出的子模块,通过引入一个shortcut(捷径)分支,将需要拟合的映射变为残差F(x):H(x)-x。ResNet给出的假设是:相较于直接优化潜在映射H(x),优化残差映射F(x)是更为容易的。

Resnet概述及代码
在ResNet中,上述的一个残差模块称为Bottleneck。ResNet有不同网络层数的版本,如18层、34层、50层、101层和152层,这里以常用的50层来讲解。ResNet-50的网络架构如图3.18所示,最主要的部分在于中间经历了4个大的卷积组,而这4个卷积组分别包含了3、4、6这3个Bottleneck模块。最后经过一个全局平均池化使得特征图大小变为1×1,然后进行1000维的全连接,最后经过Softmax输出分类得分。
Resnet概述及代码
由于F(x)+x是逐通道进行相加,因此根据两者是否通道数相同,存在两种Bottleneck结构。对于通道数不同的情况,比如每个卷积组的第一个Bottleneck,需要利用1×1卷积对x进行Downsample操作,将通道数变为相同,再进行加操作。对于相同的情况下,两者可以直接进行相加。

利用PyTorch实现一个带有Downsample操作的Bottleneck结构,新建一个resnet_bottleneck.py文件,代码如下:

import torch.nn as nn
class Bottleneck(nn.Module):
    def __init__(self, in_dim, out_dim, stride=1):
        super(Bottleneck, self).__init__()

        self.bottleneck = nn.Sequential(
            nn.Conv2d(in_dim, in_dim, 1, bias=False),
            nn.BatchNorm2d(in_dim),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_dim, in_dim, 3, stride, 1, bias=False),
            nn.BatchNorm2d(in_dim),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_dim, out_dim, 1, bias=False),
            nn.BatchNorm2d(out_dim), )
        self.relu = nn.ReLU(inplace=True)

        self.downsample = nn.Sequential(
            nn.Conv2d(in_dim, out_dim, 1, 1),
            nn.BatchNorm2d(out_dim), )
    def forward(self, x):
        identity = x
        out = self.bottleneck(x)
        identity = self.downsample(x)

        out += identity
        out = self.relu(out)
        return out

在终端中进入上述resnet_bottleneck.py文件的同级目录,输入python3进入交互式环境,利用下面的代码调用该模块。

>>> import torch
>>> from resnet_bottleneck import Bottleneck

>>> bottleneck_1_1 = Bottleneck(64, 256).cuda()
>>> bottleneck_1_1

Bottleneck(
    (bottleneck): Sequential(
    (0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True)
    (2): ReLU(inplace)
    (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_ stats=True)
    (5): ReLU(inplace)
    (6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True) )
    (relu): ReLU(inplace)

    (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_ running_stats=True) ) )
>>> input = torch.randn(1, 64, 56, 56).cuda()
>>> output = bottleneck_1_1(input)

>>> input.shape torch.Size([1, 64, 56, 56])
>>> output.shape

torch.Size([1, 256, 56, 56])

torchresNet50代码

import torch
import torch.nn as nn

Layers = [3, 4, 6, 3]
class Block(nn.Module):
    def __init__(self, in_channels, filters, stride=1, is_1x1conv=False):

        super(Block, self).__init__()
        filter1, filter2, filter3 = filters
        self.is_1x1conv = is_1x1conv
        self.relu = nn.ReLU(inplace=True)
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels, filter1, kernel_size=1, stride=stride,bias=False),
            nn.BatchNorm2d(filter1),

            nn.ReLU()
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(filter1, filter2, kernel_size=3, stride=1, padding=1,  bias=False),
            nn.BatchNorm2d(filter2),
            nn.ReLU()
        )
        self.conv3 = nn.Sequential(
            nn.Conv2d(filter2, filter3, kernel_size=1, stride=1,  bias=False),
            nn.BatchNorm2d(filter3),

        )

        if is_1x1conv:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, filter3, kernel_size=1, stride=stride,  bias=False),
                nn.BatchNorm2d(filter3)
            )
    def forward(self, x):
        x_shortcut = x
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        if self.is_1x1conv:
            x_shortcut = self.shortcut(x_shortcut)
        x = x + x_shortcut
        x = self.relu(x)
        return x

class Resnet50(nn.Module):

    def __init__(self):
        super(Resnet50,self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
        )
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.conv2 = self._make_layer(64, (64, 64, 256), Layers[0])
        self.conv3 = self._make_layer(256, (128, 128, 512), Layers[1], 2)
        self.conv4 = self._make_layer(512, (256, 256, 1024), Layers[2], 2)
        self.conv5 = self._make_layer(1024, (512, 512, 2048), Layers[3], 2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))

        self.fc = nn.Sequential(
            nn.Linear(2048, 1000)
        )
    def forward(self, input):
        x = self.conv1(input)
        x = self.maxpool(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x
    def _make_layer(self, in_channels, filters, blocks, stride=1):
        layers = []
        block_1 = Block(in_channels, filters, stride=stride, is_1x1conv=True)
        layers.append(block_1)
        print(len(layers))
        for i in range(1, blocks):
            print(filters[2])
            layers.append(Block(filters[2], filters, stride=1, is_1x1conv=False))

        return nn.Sequential(*layers)

net = Resnet50()
x = torch.rand((10, 3, 224, 224))
print(net(x).shape)
print(net)

tensorflow2cifar10

import tensorflow as tf
import os
import pickle
import numpy as np

CIFAR_DIR = r"C:\Users\Administrator.DESKTOP-T76M1PJ\.keras\datasets\cifar-10-batches-py"
print(os.listdir(CIFAR_DIR))

def load_data(filename):
    """read data from data file."""
    with open(filename, 'rb') as f:
        data = pickle.load(f, encoding='bytes')
        return data[b'data'], data[b'labels']

class CifarData:
    def __init__(self, filenames, need_shuffle):
        all_data = []
        all_labels = []
        for filename in filenames:
            data, labels = load_data(filename)
            all_data.append(data)
            all_labels.append(labels)
        self._data = np.vstack(all_data)

        self._data = self._data / 127.5 - 1
        self._labels = np.hstack(all_labels)
        print(self._data.shape)
        print(self._labels.shape)

        self._num_examples = self._data.shape[0]

        self._need_shuffle = need_shuffle

        self._indicator = 0

        if self._need_shuffle:
            self._shuffle_data()

    def _shuffle_data(self):

        p = np.random.permutation(self._num_examples)
        self._data = self._data[p]
        self._labels = self._labels[p]

    def next_batch(self, batch_size):
        """return batch_size examples as a batch."""
        end_indicator = self._indicator + batch_size
        if end_indicator > self._num_examples:
            if self._need_shuffle:
                self._shuffle_data()
                self._indicator = 0
                end_indicator = batch_size
            else:
                raise Exception("have no more examples")
        if end_indicator > self._num_examples:
            raise Exception("batch size is larger than all examples")
        batch_data = self._data[self._indicator: end_indicator]
        batch_labels = self._labels[self._indicator: end_indicator]
        self._indicator = end_indicator
        return batch_data, batch_labels

train_filenames = [os.path.join(CIFAR_DIR, 'data_batch_%d' % i) for i in range(1, 6)]
test_filenames = [os.path.join(CIFAR_DIR, 'test_batch')]

train_data = CifarData(train_filenames, True)
test_data = CifarData(test_filenames, False)

def residual_block(x, output_channel):
    """residual connection implementation"""

    input_channel = x.get_shape().as_list()[-1]

    if input_channel * 2 == output_channel:

        increase_dim = True

        strides = (2, 2)

    elif input_channel == output_channel:

        increase_dim = False

        strides = (1, 1)
    else:
        raise Exception("input channel can't match output channel")

    conv1 = tf.layers.conv2d(x,
                             output_channel,
                             (3, 3),
                             strides=strides,
                             padding='same',
                             activation=tf.nn.relu,
                             name='conv1')
    conv2 = tf.layers.conv2d(conv1,
                             output_channel,
                             (3, 3),
                             strides=(1, 1),
                             padding='same',
                             activation=tf.nn.relu,
                             name='conv2')

    if increase_dim:

        pooled_x = tf.layers.average_pooling2d(x,
                                               (2, 2),
                                               (2, 2),
                                               padding='valid')
        padded_x = tf.pad(pooled_x,
                          [[0, 0],
                           [0, 0],
                           [0, 0],
                           [input_channel // 2, input_channel // 2]])
    else:
        padded_x = x

    output_x = conv2 + padded_x
    return output_x

def res_net(x,
            num_residual_blocks,
            num_filter_base,
            class_num):
    """residual network implementation"""
"""
    Args:
    - x:
    - num_residual_blocks: eg: [3, 4, 6, 3]
    - num_filter_base:
    - class_num:
"""
    num_subsampling = len(num_residual_blocks)
    layers = []

    input_size = x.get_shape().as_list()[1:]
    print("input_size get_shape():", input_size)
    with tf.variable_scope('conv0'):
        conv0 = tf.layers.conv2d(x,
                                 num_filter_base,
                                 (3, 3),
                                 strides=(1, 1),
                                 padding='same',
                                 activation=tf.nn.relu,
                                 name='conv0')
        layers.append(conv0)
    print("layers[-1]:", layers[-1])

    for sample_id in range(num_subsampling):
        for i in range(num_residual_blocks[sample_id]):
            with tf.variable_scope("conv%d_%d" % (sample_id, i)):
                conv = residual_block(
                    layers[-1],
                    num_filter_base * (2 ** sample_id))
                layers.append(conv)
    multiplier = 2 ** (num_subsampling - 1)
    assert layers[-1].get_shape().as_list()[1:] \
           == [input_size[0] / multiplier,
               input_size[1] / multiplier,
               num_filter_base * multiplier]
    with tf.variable_scope('fc'):

        global_pool = tf.reduce_mean(layers[-1], [1, 2])
        logits = tf.layers.dense(global_pool, class_num)
        layers.append(logits)
    return layers[-1]

x = tf.placeholder(tf.float32, [None, 3072])
y = tf.placeholder(tf.int64, [None])

x_image = tf.reshape(x, [-1, 3, 32, 32])

x_image = tf.transpose(x_image, perm=[0, 2, 3, 1])

y_ = res_net(x_image, [2, 3, 2], 32, 10)

loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)

predict = tf.argmax(y_, 1)

correct_prediction = tf.equal(predict, y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float64))

with tf.name_scope('train_op'):
    train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)

init = tf.global_variables_initializer()
batch_size = 20
train_steps = 10000
test_steps = 100

with tf.Session() as sess:
    sess.run(init)
    for i in range(train_steps):
        batch_data, batch_labels = train_data.next_batch(batch_size)
        loss_val, acc_val, _ = sess.run(
            [loss, accuracy, train_op],
            feed_dict={
                x: batch_data,
                y: batch_labels})
        if (i + 1) % 500 == 0:
            print('[Train] Step: %d, loss: %4.5f, acc: %4.5f'
                  % (i + 1, loss_val, acc_val))

        if (i + 1) % 5000 == 0:
            test_data = CifarData(test_filenames, False)
            all_test_acc_val = []
            for j in range(test_steps):
                test_batch_data, test_batch_labels \
                    = test_data.next_batch(batch_size)
                test_acc_val = sess.run(
                    [accuracy],
                    feed_dict={
                        x: test_batch_data,
                        y: test_batch_labels
                    })
                all_test_acc_val.append(test_acc_val)
            test_acc = np.mean(all_test_acc_val)
            print('[Test ] Step: %d, acc: %4.5f' % (i + 1, test_acc))

torch简单版

import torch
from torch import nn

def conv3x3(in_channel, out_channel, stride=1):
    return nn.Conv2d(in_channel, out_channel, kernel_size=3, stride=stride, padding=1, bias=False)

def conv1x1(in_channel, out_channel, stride=1):
    return nn.Conv2d(in_channel, out_channel, kernel_size=1, stride=stride, bias=False)

class BasicBlock(nn.Module):
    def __init__(self, in_channel):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Sequential(
            conv3x3(in_channel, in_channel),
            nn.BatchNorm2d(in_channel),
            nn.ReLU(inplace=True)
        )
        self.conv2 = nn.Sequential(
            conv3x3(in_channel, in_channel),
            nn.BatchNorm2d(in_channel),
            nn.ReLU(inplace=True)
        )

        self.relu = nn.ReLU(inplace=True)

    def forward(self, input):
        x = self.conv1(input)

        x = self.conv2(x)

        out = self.relu(x + input)

        return out

class Bottleneck(nn.Module):
    exp = 4

    def __init__(self, in_channel):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Sequential(
            conv1x1(in_channel, in_channel),
            nn.BatchNorm2d(in_channel)
        )
        self.conv2 = nn.Sequential(
            conv3x3(in_channel, in_channel),
            nn.BatchNorm2d(in_channel),
        )
        self.conv3 = nn.Sequential(
            conv1x1(in_channel, in_channel * self.exp),
            nn.BatchNorm2d(in_channel * self.exp),
            nn.ReLU(inplace=True)
        )

        self.downsample = nn.Sequential(
            conv1x1(in_channel, in_channel * self.exp),
            nn.BatchNorm2d(in_channel * self.exp),
            nn.ReLU(inplace=True)
        )

        self.relu = nn.ReLU(inplace=True)

    def forward(self, input):
        x = self.conv1(input)

        x = self.conv2(x)

        x = self.conv3(x)

        identity = self.downsample(input)

        out = self.relu(x + identity)

        return out

class ResNet(nn.Module):
    def __init__(self, in_channel, num_classes):
        super(ResNet, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channel, 64, kernel_size=5, stride=2, padding=2),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True)
        )

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.res_block1 = nn.Sequential(
            BasicBlock(64),
            BasicBlock(64)
        )
        self.res_block2 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.ReLU(inplace=True),
            Bottleneck(64),
            Bottleneck(64 * Bottleneck.exp),
            Bottleneck(64 * (Bottleneck.exp ** 2))
        )

        self.output = nn.Sequential(
            nn.Conv2d(64 * (Bottleneck.exp ** 3), 1024, kernel_size=4, bias=False),
            nn.BatchNorm2d(1024),
            nn.ReLU(inplace=True),
            nn.Flatten(),
            nn.Linear(1024, num_classes),
            nn.Dropout(0.4),
            nn.ReLU(inplace=True),
        )

    def forward(self, input):
        x = self.conv1(input)

        x = self.maxpool(x)

        x = self.res_block1(x)

        x = self.res_block2(x)

        x = self.output(x)

        return x

if __name__ == '__main__':
    fack_img = torch.randint(0, 255, [10, 3, 32, 32]).type(torch.FloatTensor)
    resnet = ResNet(3, 10)
    out = resnet(fack_img)
    print(out)

import torch
from torch.utils.data import DataLoader
import torch.nn as nn
from torchvision import transforms
from torch import optim
from torchvision import datasets
from visdom import Visdom

batch_size= 32
learning_rate = 0.01
epochs = 2
train_cifar = datasets.CIFAR10('../pytorch/手写数字识别/cifar',train=True,transform=transforms.Compose([
    transforms.Resize((32,32)),
    transforms.ToTensor()
]),download=True)
train_cifar = DataLoader(train_cifar,batch_size= batch_size,shuffle=True)

test_cifar = datasets.CIFAR10('../pytorch/手写数字识别/cifar',train=False,transform=transforms.Compose([
    transforms.Resize((32,32)),
    transforms.ToTensor()
]),download=True)
test_cifar = DataLoader(test_cifar,batch_size= batch_size,shuffle=False)

sample,y_sample = iter(train_cifar).next()
print(sample.shape,y_sample.shape)

Layers = [3,4]
class Block(nn.Module):
    def __init__(self,in_channels,filters,stride,is_1x1conv = False):
        super(Block, self).__init__()
        self.is_1x1conv = is_1x1conv
        self.relu = nn.ReLU(True)
        filter1,filter2,filter3 = filters
        self.conv1= nn.Sequential(
            nn.Conv2d(in_channels,filter1,kernel_size=1,stride=stride,bias=False),
            nn.BatchNorm2d(filter1),
            nn.ReLU(True),
        )
        self.conv2 = nn.Sequential(
            nn.Conv2d(filter1, filter2,kernel_size=3, stride=1,padding=1, bias=False),
            nn.BatchNorm2d(filter2),
            nn.ReLU(True),
        )
        self.conv3= nn.Sequential(
            nn.Conv2d(filter2,filter3,kernel_size=1,stride=1,bias=False),
            nn.BatchNorm2d(filter3),

        )
        if is_1x1conv:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels,filter3,kernel_size=1,stride=stride,bias=False),
                nn.BatchNorm2d(filter3),
            )
    def forward(self,x):
        x_shortcut = x
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        if self.is_1x1conv:
            x_shortcut = self.shortcut(x_shortcut)
        x =x + x_shortcut
        x = self.relu(x)
        return x
class ResNet(nn.Module):
    def __init__(self):
        super(ResNet, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(3,64,kernel_size=7,stride=2,padding=3),
            nn.BatchNorm2d(64),
            nn.ReLU(True)
        )
        self.maxpool = nn.MaxPool2d(kernel_size=3,stride=2,padding=1)

        self.conv2 = self._make_layers(64,(64,64,256),Layers[0])
        self.conv3 = self._make_layers(256,(128,128,512),Layers[1],2)
        self.avgpool = nn.AdaptiveAvgPool2d((1,1))
        self.fc = nn.Sequential(
            nn.Linear(512,10),
        )
    def forward(self,input):
        x = self.conv1(input)
        x = self.maxpool(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.avgpool(x)
        x = torch.flatten(x,1)
        x = self.fc(x)
        return x

    def _make_layers(self,in_channels,filters,blocks,stride = 1):
        layers = []
        block_1 = Block(in_channels,filters,stride,is_1x1conv = True)
        layers.append(block_1)
        for i in range(1,blocks):
            print(filters[2])
            layers.append(Block(filters[2],filters,stride=1,is_1x1conv=False))
        return nn.Sequential(*layers)

model = ResNet()
print(model(sample).shape)
optimizer = optim.SGD(model.parameters(),lr = learning_rate)
critenron = nn.CrossEntropyLoss()

global_step = 0

for epoch in range(epochs):
    for idx,(x,y) in enumerate(train_cifar):
        output = model(x)
        loss = critenron(output,y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        global_step += 1

        if idx %10 == 0:
            print("epoch:",epoch,'idx:',idx,'loss:',loss)
    test_loss = 0
    correct = 0
    for data,target in test_cifar:
        logits = model(data)

        test_loss+= critenron(logits,target).item()

        pred = logits.argmax(dim = 1 )
        correct += pred.eq(target.data).float().sum().item()

    print('correct :',correct / len(test_cifar.dataset))

为什么出现残差网络?
作者发现,随着网络层数的增加,网络发生了退化(degradation)的现象:随着网络层数的增多,训练集loss逐渐下降,然后趋于饱和,当你再增加网络深度的话,训练集loss反而会增大。注意这并不是过拟合,因为在过拟合中训练loss是一直减小的。

Resnet概述及代码
残差块结构
Resnet概述及代码
Resnet概述及代码
在2015年的ILSVRC比赛获得第一之后,何恺明对残差网络进行了改进,主要是把ReLu给移动到了conv之前,相应的shortcut不在经过ReLu,相当于输入输出直连。并且论文中对ReLu,BN和卷积层的顺序做了实验,最后确定了改进后的残差网络基本构造模块,因为对于每个单元,激活函数放在了仿射变换之前,所以论文叫做预激活残差单元
Resnet概述及代码

Original: https://blog.csdn.net/shuai_yue/article/details/117964712
Author: 帅殿天下
Title: Resnet概述及代码

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/511858/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球