Pytorch与深度学习自查手册3-模型定义

2023年5月28日下午12:47 • 大数据 • 阅读 93

定义神经网络

继承 nn.Module类；
初始化函数 __init__：网络层设计；
forward函数：模型运行逻辑。

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

权重初始化

pytorch中的权值初始化

方法1：net.apply(weights_init)

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

net=Model()
net.apply(weights_init)

方法2：在网络初始化的时候进行参数初始化

使用net.modules()遍历模型中的网络层的类型；
对其中的m层的weigth.data(tensor)部分进行初始化操作。

class Model(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = ...

        ...

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

常用的操作

利用nn.Parameter()设计新的层

import torch
from torch import nn

class MyLinear(nn.Module):
  def __init__(self, in_features, out_features):
    super().__init__()
    self.weight = nn.Parameter(torch.randn(in_features, out_features))
    self.bias = nn.Parameter(torch.randn(out_features))

  def forward(self, input):
    return (input @ self.weight) + self.bias

nn.Flatten

展平输入的张量： 28×28 -> 784

input = torch.randn(32, 1, 5, 5)
m = nn.Sequential(
    nn.Conv2d(1, 32, 5, 1, 1),
    nn.Flatten()
)
output = m(input)
output.size()

nn.Sequential

一个有序的容器，神经网络模块将按照在传入构造器的顺序依次被添加到计算图中执行，同时以神经网络模块为元素的有序字典也可以作为传入参数。

net = nn.Sequential(
   ('fc1',MyLinear(4, 3)),
   ('act',nn.ReLU()),
   ('fc2',MyLinear(3, 1))
)

常用的层

全连接层nn.Linear()

torch.nn.Linear(in_features, out_features, bias=True,device=None, dtype=None)

in_features: &#x8F93;&#x5165;&#x7EF4;&#x5EA6;
out_features:&#x8F93;&#x51FA;&#x7EF4;&#x5EA6;
bias: &#x662F;&#x5426;&#x6709;&#x504F;&#x7F6E;

m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(output.size())

卷积 `torch.nn.ConvNd()`

class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)

in_channels(int) &#x2013; &#x8F93;&#x5165;&#x4FE1;&#x53F7;&#x7684;&#x901A;&#x9053;
out_channels(int) &#x2013; &#x5377;&#x79EF;&#x4EA7;&#x751F;&#x7684;&#x901A;&#x9053;
kerner_size(int or tuple) - &#x5377;&#x79EF;&#x6838;&#x7684;&#x5C3A;&#x5BF8;
stride(int or tuple, optional) - &#x5377;&#x79EF;&#x6B65;&#x957F;
padding (int or tuple, optional)- &#x8F93;&#x5165;&#x7684;&#x6BCF;&#x4E00;&#x6761;&#x8FB9;&#x8865;&#x5145;0&#x7684;&#x5C42;&#x6570;
dilation(int or tuple, optional) &#x2013; &#x5377;&#x79EF;&#x6838;&#x5143;&#x7D20;&#x4E4B;&#x95F4;&#x7684;&#x95F4;&#x8DDD;
groups(int, optional) &#x2013; &#x4ECE;&#x8F93;&#x5165;&#x901A;&#x9053;&#x5230;&#x8F93;&#x51FA;&#x901A;&#x9053;&#x7684;&#x963B;&#x585E;&#x8FDE;&#x63A5;&#x6570;
bias(bool, optional) - &#x5982;&#x679C;bias=True&#xFF0C;&#x6DFB;&#x52A0;&#x504F;&#x7F6E;

input: (N,C_in,H_in,W_in) N为批次，C_in即为in_channels，即一批内输入二维数据个数，H_in是二维数据行数，W_in是二维数据的列数
output: (N,C_out,H_out,W_out) N为批次，C_out即为out_channels，即一批内输出二维数据个数，H_out是二维数据行数，W_out是二维数据的列数

conv2 = nn.Conv2d(
            in_channels=5,
            out_channels=32,
            kernel_size=5,
            stride=1,
            padding=2
)

conv2(torch.Tensor(16, 5, 3, 10))
conv2(Variable(torch.Tensor(16, 5, 3, 10)))

池化

最大池化 `torch.nn.MaxPoolNd()`

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

kernel_size- &#x7A97;&#x53E3;&#x5927;&#x5C0F;
stride- &#x6B65;&#x957F;&#x3002;&#x9ED8;&#x8BA4;&#x503C;&#x662F;kernel_size
padding - &#x8865;0&#x6570;
dilation&#x2013; &#x63A7;&#x5236;&#x7A97;&#x53E3;&#x4E2D;&#x5143;&#x7D20;&#x6B65;&#x5E45;&#x7684;&#x53C2;&#x6570;
return_indices - &#x5982;&#x679C;&#x7B49;&#x4E8E;True&#xFF0C;&#x4F1A;&#x8FD4;&#x56DE;&#x8F93;&#x51FA;&#x6700;&#x5927;&#x503C;&#x7684;&#x5E8F;&#x53F7;&#xFF0C;&#x5BF9;&#x4E8E;&#x4E0A;&#x91C7;&#x6837;&#x64CD;&#x4F5C;&#x4F1A;&#x6709;&#x5E2E;&#x52A9;
ceil_mode - &#x5982;&#x679C;&#x7B49;&#x4E8E;True&#xFF0C;&#x8BA1;&#x7B97;&#x8F93;&#x51FA;&#x4FE1;&#x53F7;&#x5927;&#x5C0F;&#x7684;&#x65F6;&#x5019;&#xFF0C;&#x4F1A;&#x4F7F;&#x7528;&#x5411;&#x4E0A;&#x53D6;&#x6574;&#xFF0C;&#x4EE3;&#x66FF;&#x9ED8;&#x8BA4;&#x7684;&#x5411;&#x4E0B;&#x53D6;&#x7684;&#x64CD;&#x4F5C;

max2=torch.nn.MaxPool2d(3,1,0,1)
max2(torch.Tensor(16,15,15,14))

均值池化 `torch.nn.AvgPoolNd()`

kernel_size - &#x6C60;&#x5316;&#x7A97;&#x53E3;&#x5927;&#x5C0F;
stride- &#x6B65;&#x957F;&#x3002;&#x9ED8;&#x8BA4;&#x503C;&#x662F;kernel_size
padding- &#x8F93;&#x5165;&#x7684;&#x6BCF;&#x4E00;&#x6761;&#x8FB9;&#x8865;&#x5145;0&#x7684;&#x5C42;&#x6570;
dilation &#x2013; &#x4E00;&#x4E2A;&#x63A7;&#x5236;&#x7A97;&#x53E3;&#x4E2D;&#x5143;&#x7D20;&#x6B65;&#x5E45;&#x7684;&#x53C2;&#x6570;
return_indices - &#x5982;&#x679C;&#x7B49;&#x4E8E;True&#xFF0C;&#x4F1A;&#x8FD4;&#x56DE;&#x8F93;&#x51FA;&#x6700;&#x5927;&#x503C;&#x7684;&#x5E8F;&#x53F7;&#xFF0C;&#x5BF9;&#x4E8E;&#x4E0A;&#x91C7;&#x6837;&#x64CD;&#x4F5C;&#x4F1A;&#x6709;&#x5E2E;&#x52A9;
ceil_mode - &#x5982;&#x679C;&#x7B49;&#x4E8E;True&#xFF0C;&#x8BA1;&#x7B97;&#x8F93;&#x51FA;&#x4FE1;&#x53F7;&#x5927;&#x5C0F;&#x7684;&#x65F6;&#x5019;&#xFF0C;&#x4F1A;&#x4F7F;&#x7528;&#x5411;&#x4E0A;&#x53D6;&#x6574;&#xFF0C;&#x4EE3;&#x66FF;&#x9ED8;&#x8BA4;&#x7684;&#x5411;&#x4E0B;&#x53D6;&#x6574;&#x7684;&#x64CD;&#x4F5C;

torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)

反池化

是池化的一个”逆”过程，但”逆”只是通过上采样恢复到原来的尺寸，像素值是不能恢复成原来一模一样，因为像最大池化是不可逆的，除最大值之外的像素都已经丢弃了。

最大值反池化 `nn.MaxUnpool2d()`

功能：对二维图像进行最大值池化上采样

参数：

kernel_size- &#x7A97;&#x53E3;&#x5927;&#x5C0F;
stride - &#x6B65;&#x957F;&#x3002;&#x9ED8;&#x8BA4;&#x503C;&#x662F;kernel_size
padding - &#x8865;0&#x6570;

torch.nn.MaxUnpool2d(kernel_size, stride=None, padding=0)

img_tensor=torch.Tensor(16,5,32,32)

max_pool = nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2), return_indices=True, ceil_mode=True)
img_pool, indices = max_pool(img_tensor)

img_unpool = torch.rand_like(img_pool, dtype=torch.float)
max_unpool = nn.MaxUnpool2d((2, 2), stride=(2, 2))
img_unpool = max_unpool(img_unpool, indices)

组合池化

组合池化同时利用最大值池化与均值池化两种的优势而引申的一种池化策略。常见组合策略有两种：Cat与Add。其代码描述如下：

def add_avgmax_pool2d(x, output_size=1):
    x_avg = F.adaptive_avg_pool2d(x, output_size)
    x_max = F.adaptive_max_pool2d(x, output_size)
    return 0.5 * (x_avg + x_max)

def cat_avgmax_pool2d(x, output_size=1):
    x_avg = F.adaptive_avg_pool2d(x, output_size)
    x_max = F.adaptive_max_pool2d(x, output_size)
    return torch.cat([x_avg, x_max], 1)

正则化层

Transformer相关——（6）Normalization方式

Normalization Layers

BatchNorm

torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
torch.nn.BatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

参数：

num_features&#xFF1A; &#x6765;&#x81EA;&#x671F;&#x671B;&#x8F93;&#x5165;&#x7684;&#x7279;&#x5F81;&#x6570;&#xFF0C;&#x8BE5;&#x671F;&#x671B;&#x8F93;&#x5165;&#x7684;&#x5927;&#x5C0F;&#x4E3A;batch_size &#xD7; num_features [&#xD7; width]&#xFF0C;&#x548C;&#x4E4B;&#x524D;&#x8F93;&#x5165;&#x5377;&#x79EF;&#x5C42;&#x7684;channel&#x4F4D;&#x7684;&#x7EF4;&#x5EA6;&#x6570;&#x76EE;&#x76F8;&#x540C;
eps&#xFF1A; &#x4E3A;&#x4FDD;&#x8BC1;&#x6570;&#x503C;&#x7A33;&#x5B9A;&#x6027;&#xFF08;&#x5206;&#x6BCD;&#x4E0D;&#x80FD;&#x8D8B;&#x8FD1;&#x6216;&#x53D6;0&#xFF09;,&#x7ED9;&#x5206;&#x6BCD;&#x52A0;&#x4E0A;&#x7684;&#x503C;&#x3002;&#x9ED8;&#x8BA4;&#x4E3A;1e-5&#x3002;
momentum&#xFF1A; &#x52A8;&#x6001;&#x5747;&#x503C;&#x548C;&#x52A8;&#x6001;&#x65B9;&#x5DEE;&#x6240;&#x4F7F;&#x7528;&#x7684;&#x52A8;&#x91CF;&#x3002;&#x9ED8;&#x8BA4;&#x4E3A;0.1&#x3002;
affine&#xFF1A; &#x5E03;&#x5C14;&#x503C;&#xFF0C;&#x5F53;&#x8BBE;&#x4E3A;true&#xFF0C;&#x7ED9;&#x8BE5;&#x5C42;&#x6DFB;&#x52A0;&#x53EF;&#x5B66;&#x4E60;&#x7684;&#x4EFF;&#x5C04;&#x53D8;&#x6362;&#x53C2;&#x6570;&#x3002;
track_running_stats&#xFF1A;&#x5E03;&#x5C14;&#x503C;&#xFF0C;&#x5F53;&#x8BBE;&#x4E3A;true&#xFF0C;&#x8BB0;&#x5F55;&#x8BAD;&#x7EC3;&#x8FC7;&#x7A0B;&#x4E2D;&#x7684;&#x5747;&#x503C;&#x548C;&#x65B9;&#x5DEE;&#xFF1B;


m = nn.BatchNorm2d(5)

m = nn.BatchNorm2d(5, affine=False)
inputs = torch.randn(20, 5, 35, 45)
output = m(inputs)

LayerNorm

torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True)

参数：

normalized_shape&#xFF1A; &#x8F93;&#x5165;&#x5C3A;&#x5BF8;
[&#xD7; normalized_shape[0] &#xD7; normalized_shape[1]&#xD7;&#x2026;&#xD7; normalized_shape[&#x2212;1]]
eps&#xFF1A; &#x4E3A;&#x4FDD;&#x8BC1;&#x6570;&#x503C;&#x7A33;&#x5B9A;&#x6027;&#xFF08;&#x5206;&#x6BCD;&#x4E0D;&#x80FD;&#x8D8B;&#x8FD1;&#x6216;&#x53D6;0&#xFF09;,&#x7ED9;&#x5206;&#x6BCD;&#x52A0;&#x4E0A;&#x7684;&#x503C;&#x3002;&#x9ED8;&#x8BA4;&#x4E3A;1e-5&#x3002;
elementwise_affine&#xFF1A; &#x5E03;&#x5C14;&#x503C;&#xFF0C;&#x5F53;&#x8BBE;&#x4E3A;true&#xFF0C;&#x7ED9;&#x8BE5;&#x5C42;&#x6DFB;&#x52A0;&#x53EF;&#x5B66;&#x4E60;&#x7684;&#x4EFF;&#x5C04;&#x53D8;&#x6362;&#x53C2;&#x6570;&#x3002;

LayerNorm就是对(2, 2,4), 后面这一部分进行整个的标准化。可以理解为对整个图像进行标准化。

x_test = np.array([[[1,2,-1,1],[3,4,-2,2]],
                   [[1,2,-1,1],[3,4,-2,2]]])
x_test = torch.from_numpy(x_test).float()

m = nn.LayerNorm(normalized_shape = [2,4])
output = m(x_test)

InstanceNorm

torch.nn.InstanceNorm1d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
torch.nn.InstanceNorm2d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
torch.nn.InstanceNorm3d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)

参数：

num_features&#xFF1A; &#x6765;&#x81EA;&#x671F;&#x671B;&#x8F93;&#x5165;&#x7684;&#x7279;&#x5F81;&#x6570;&#xFF0C;&#x8BE5;&#x671F;&#x671B;&#x8F93;&#x5165;&#x7684;&#x5927;&#x5C0F;&#x4E3A;batch_size x num_features [x width]
eps&#xFF1A; &#x4E3A;&#x4FDD;&#x8BC1;&#x6570;&#x503C;&#x7A33;&#x5B9A;&#x6027;&#xFF08;&#x5206;&#x6BCD;&#x4E0D;&#x80FD;&#x8D8B;&#x8FD1;&#x6216;&#x53D6;0&#xFF09;,&#x7ED9;&#x5206;&#x6BCD;&#x52A0;&#x4E0A;&#x7684;&#x503C;&#x3002;&#x9ED8;&#x8BA4;&#x4E3A;1e-5&#x3002;
momentum&#xFF1A; &#x52A8;&#x6001;&#x5747;&#x503C;&#x548C;&#x52A8;&#x6001;&#x65B9;&#x5DEE;&#x6240;&#x4F7F;&#x7528;&#x7684;&#x52A8;&#x91CF;&#x3002;&#x9ED8;&#x8BA4;&#x4E3A;0.1&#x3002;
affine&#xFF1A; &#x5E03;&#x5C14;&#x503C;&#xFF0C;&#x5F53;&#x8BBE;&#x4E3A;true&#xFF0C;&#x7ED9;&#x8BE5;&#x5C42;&#x6DFB;&#x52A0;&#x53EF;&#x5B66;&#x4E60;&#x7684;&#x4EFF;&#x5C04;&#x53D8;&#x6362;&#x53C2;&#x6570;&#x3002;
track_running_stats&#xFF1A;&#x5E03;&#x5C14;&#x503C;&#xFF0C;&#x5F53;&#x8BBE;&#x4E3A;true&#xFF0C;&#x8BB0;&#x5F55;&#x8BAD;&#x7EC3;&#x8FC7;&#x7A0B;&#x4E2D;&#x7684;&#x5747;&#x503C;&#x548C;&#x65B9;&#x5DEE;&#xFF1B;

InstanceNorm就是对(2, 2, 4)最后这一部分进行Norm。

x_test = np.array([[[1,2,-1,1],[3,4,-2,2]],
                   [[1,2,-1,1],[3,4,-2,2]]])
x_test = torch.from_numpy(x_test).float()

m = nn.InstanceNorm1d(num_features=2)
output = m(x_test)

GroupNorm

torch.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True)

参数：

num_groups&#xFF1A;&#x9700;&#x8981;&#x5212;&#x5206;&#x4E3A;&#x7684;groups
num_features&#xFF1A; &#x6765;&#x81EA;&#x671F;&#x671B;&#x8F93;&#x5165;&#x7684;&#x7279;&#x5F81;&#x6570;&#xFF0C;&#x8BE5;&#x671F;&#x671B;&#x8F93;&#x5165;&#x7684;&#x5927;&#x5C0F;&#x4E3A;batch_size x num_features [x width]
eps&#xFF1A; &#x4E3A;&#x4FDD;&#x8BC1;&#x6570;&#x503C;&#x7A33;&#x5B9A;&#x6027;&#xFF08;&#x5206;&#x6BCD;&#x4E0D;&#x80FD;&#x8D8B;&#x8FD1;&#x6216;&#x53D6;0&#xFF09;,&#x7ED9;&#x5206;&#x6BCD;&#x52A0;&#x4E0A;&#x7684;&#x503C;&#x3002;&#x9ED8;&#x8BA4;&#x4E3A;1e-5&#x3002;
momentum&#xFF1A; &#x52A8;&#x6001;&#x5747;&#x503C;&#x548C;&#x52A8;&#x6001;&#x65B9;&#x5DEE;&#x6240;&#x4F7F;&#x7528;&#x7684;&#x52A8;&#x91CF;&#x3002;&#x9ED8;&#x8BA4;&#x4E3A;0.1&#x3002;
affine&#xFF1A; &#x5E03;&#x5C14;&#x503C;&#xFF0C;&#x5F53;&#x8BBE;&#x4E3A;true&#xFF0C;&#x7ED9;&#x8BE5;&#x5C42;&#x6DFB;&#x52A0;&#x53EF;&#x5B66;&#x4E60;&#x7684;&#x4EFF;&#x5C04;&#x53D8;&#x6362;&#x53C2;&#x6570;&#x3002;

当GroupNorm中group的数量是1的时候, 是与上面的LayerNorm是等价的。

x_test = np.array([[[1,2,-1,1],[3,4,-2,2]],
                   [[1,2,-1,1],[3,4,-2,2]]])
x_test = torch.from_numpy(x_test).float()

m = nn.GroupNorm(num_groups=1, num_channels=2, affine=False)
output = m(x_test)

当 GroupNorm中 num_groups的数量等于num_channel的数量，与InstanceNorm等价。


m = nn.GroupNorm(num_groups=2, num_channels=2, affine=False)
output = m(x_test)

激活函数

参考资料：GELU 激活函数

Pytorch激活函数及优缺点比较

`torch.nn.GELU`

G E L U ( x ) = x P ( X < = x ) = x Φ ( x ) GELU(x)=xP(X

这里$Φ ( x ) 是正太分布的概率函数，可以简单采用正太分布是正太分布的概率函数，可以简单采用正太分布是正太分布的概率函数，可以简单采用正太分布N ( 0 , 1 )要，或者使用参数化的正太分布要，或者使用参数化的正太分布要，或者使用参数化的正太分布N ( μ , σ )$ ，然后通过训练得到μ μμ。

对于假设为标准正太分布的G E L U ( x ) GELU(x)G E L U (x )，论文中提供了近似计算的数学公式，如下：
G E L U ( x ) = 0.5 x ( 1 + t a n h [ 2 / π ( x + 0.044715 x 3 ) ] ) GELU(x) = 0.5x(1+tanh[\sqrt{2/\pi}(x+0.044715x^3)])G E L U (x )=0 .5 x (1 +t a n h [2 /π(x +0 .0 4 4 7 1 5 x 3 )])
bert源码给出的GELU代码pytorch版本表示如下：

def gelu(input_tensor):
    cdf = 0.5 * (1.0 + torch.erf(input_tensor / torch.sqrt(2.0)))
    return input_tesnsor*cdf

e r f ( x ) = 2 π ∫ 0 x e − t 2 d t erf(x)= \frac{2}{π}∫_0^xe^{−t^2}dt e r f (x )=π2 ∫0 x e −t 2 d t

`torch.nn.ELU` ( alpha=1.0 , inplace=False )

def elu(x,alpha=1.0,inplace=False):
    return max(0,x)+min(0,alpha∗(exp(x)−1))

α是超参数，默认为1.0

`torch.nn.LeakyReLU` ( negative_slope=0.01 , inplace=False )

def LeakyReLU(x,negative_slope=0.01,inplace=False):
    return max(0,x)+negative_slope∗min(0,x)

其中 negative_slope是超参数，控制x为负数时斜率的角度，默认为1e-2

`torch.nn.PReLU` ( num_parameters=1 , init=0.25 )

def PReLU(x,num_parameters=1,init=0.25):
    return max(0,x)+init∗min(0,x)

其中a 是一个可学习的参数，当不带参数调用时，即nn.PReLU()，在所有的输入通道上使用同一个a，当带参数调用时，即nn.PReLU(nChannels)，在每一个通道上学习一个单独的a。

注意：当为了获得好的performance学习一个a时，不要使用weight decay。

num_parameters：要学习的a的个数，默认1

init：a的初始值，默认0.25

`torch.nn.ReLU` ( inplace=False )

CNN中最常用ReLu。

def ReLU(x,inplace=False):
    return max(0,x)

`torch.nn.ReLU6` ( inplace=False )

def ReLU6(x,inplace=False):
    return min(max(0,x),6)

`torch.nn.SELU` ( inplace=False )

def SELU(x,inplace=False):
    alpha=1.6732632423543772848170429916717
    scale=1.0507009873554804934193349852946
    return scale∗(max(0,*x*)+min(0,alpha∗(exp(x)−1)))

`torch.nn.CELU` ( alpha=1.0 , inplace=False )

def CELU(x,alpha=1.0,inplace=False):
    return max(0,x)+min(0,alpha∗(exp(x/alpha)−1))

其中α 默认为1.0

`torch.nn.Sigmoid`

S i g m o i d ( x ) = 1 1 + e x p ( − x ) Sigmoid(x)=\frac{1}{1+exp(-x)}S i g m o i d (x )=1 +e x p (−x )1

def Sigmoid(x):
    return 1/(np.exp(-x)+1)

`torch.nn.LogSigmoid`

L o g S i g m o i d ( x ) = l o g ( 1 1 + e x p ( − x ) ) LogSigmoid(x)=log(\frac{1}{1+exp(-x)})L o g S i g m o i d (x )=l o g (1 +e x p (−x )1 )

def LogSigmoid(x):
    return np.log(1/(np.exp(-x)+1))

`torch.nn.Tanh`

T a n h ( x ) = t a n h ( x ) = e x − e − x e x + e − x Tanh(x)=tanh(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}T a n h (x )=t a n h (x )=e x +e −x e x −e −x

def Tanh(x):
    return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))

`torch.nn.Tanhshrink`

T a n h s h r i n k ( x ) = x − T a n h ( x ) Tanhshrink(x)=x−Tanh(x)T a n h s h r i n k (x )=x −T a n h (x )

def Tanhshrink(x):
    return x-(np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))

`torch.nn.Softplus` ( beta=1 , threshold=20 )

S o f t p l u s ( x ) = 1 β ∗ l o g ( 1 + e x p ( β ∗ x ) ) Softplus(x)=\frac{1}{\beta}log(1+exp(\betax))S o f t p l u s (x )=β1 ∗l o g (1 +e x p (β∗x ))

该函数可以看作是ReLu的平滑近似。

def Softplus(x,beta=1,threshold=20):
    return np.log(1+np.exp(beta*x))/beta

`torch.nn.Softshrink` ( lambd=0.5 )

S o f t S h r i n k a g e ( x ) = { x − λ , if x > λ x + λ , if x < − λ 0 , otherwise SoftShrinkage(x) = \left{ \begin{array}{ll} x-\lambda, & \textrm{if $x>\lambda$}\ x+\lambda, & \textrm{if $x

_λ_的值默认设置为0.5

def Softshrink(x,lambd=0.5):
    if x>lambd:return x-lambd
    elif x<-lambd:return x+lambd
    else:return 0

nn.Softmax

S o f t m a x ( x i ) = e x p ( x i ) ∑ j e x p ( x j ) Softmax(x_i)=\frac{exp(x_i)}{∑_jexp(x_j)}S o f t m a x (x i )=∑j e x p (x j )e x p (x i )

m = nn.Softmax(dim=1)
input = torch.randn(2, 3)
output = m(input)

参考资料

Pytorch激活函数及优缺点比较

PyTorch快速入门教程二（线性回归以及logistic回归）

Pytorch全连接网络

pytorch系列7 —–nn.Sequential讲解

Original: https://blog.csdn.net/weixin_43243315/article/details/121657881
Author: 冬于
Title: Pytorch与深度学习自查手册3-模型定义

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/531600/

转载文章受原作者版权保护。转载请注明原作者出处！

大数据

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Android JetPack架构——结合记事本Demo一篇打通对Sqlite的增删改查结合常用jetpack架构应用

Room是Google提供的一个ORM库。 Room持久性库在 SQLite 的基础上提供了一个抽象层，让用户能够在充分利用 SQLite 的强大功能的同时，获享更强健的数据库访问…

大数据 2023年11月10日
0055
django与sqlite版本对应关系，更换python使用的SQLite版本

django默认使用sqlite数据库，而使用sqlite是通过python来支持的，因为python本身内置sqlite支持，那么django——SQLite——python三者…

大数据 2023年11月11日
0071
Flume学习记录

Flume 是 Cloudera 提供的一个高可用的，高可靠的，分布式的海量日志采集、聚合和传输的系统。Flume 基于流式架构，灵活简单。 Flume基础架构 Agent Ag…

大数据 2023年6月3日
0062
Docker的/var/run/docker.sock参数

1、关于数据卷参数/var/run/docker.sock 在创建docker容器时，有时会用到/var/run/docker.sock这样的数据卷参数，例如fluentbit-o…

大数据 2023年5月28日
0077
使用 gorm.DefaultTableNameHandler 可能存在的问题

; 业务背景有这样的业务场景, 线上一个表 tablea，先生环境还有一个镜像表 tablea_mirror，你需要当请求中有一些 tag 标识的时候，访问 tablea_…

大数据 2023年11月11日
0056
docker 单机hadoop 20220723 sequenceiq/hadoop-docker:2.6.0

安装环境centos7.2docker 18.06.0-ce(这个安装参考我的另外一篇博客首先关闭防火墙systemctl stop firewalld.service #停止fi…

大数据 2023年5月29日
0082
HIVE 表 DLL 基本操作（一）——第1关：Create/Alter/Drop 数据库

第1关：Create/Alter/Drop 数据库任务描述本关任务：根据编程要求对数据库进行相关操作。相关知识为了完成本关任务，你需要掌握： 1.如何创建数据库； 2.如何…

大数据 2023年11月13日
0083
【使用SqliteSpy访问Sqlite3数据库】

软件准备 sqlite3数据库；SQLiteSpy，图形化访问sqlite3数据库，建表，执行SQL语句；新建空文件夹，例如E:\demo0525_testsqlite3，存放数据…

大数据 2023年11月11日
0045
Anaconda/pip 更换阿里源，助力 conda create -n 虚拟环境搭建

一、问题概述：由于网络和时间的限制，很多的 conda 源，如清华源，中科大源都需要想办法才能创建好虚拟环境（如本人发现的将清华源中的 https:// 改为 http:// …

大数据 2023年5月27日
00450
Hive的分区、分桶

Hive的分区、分桶 Hive的分区、分桶 Hive分区静态分区（SP）创建单分区表多分区表语法分区表查询 null 动态分区（DP）动静态分区的优缺点 Hive分桶分…

大数据 2023年6月2日
0096
spark-2.4.5编译支持Hadoop-3.3.1和Hive-3.1.2

大数据 2023年11月16日
0075
大数据平台迁移实践 | Apache DolphinScheduler 在当贝大数据环境中的应用

大家下午好，我是来自当贝网络科技大数据平台的基础开发工程师王昱翔，感谢社区的邀请来参与这次分享，关于 Apache DolphinScheduler 在当贝网络科技大数据环境中的…

大数据 2023年6月2日
0080
VScode搭建docker环境

镜像下载、域名解析、时间同步请点击阿里云开源镜像站前言本环境通过使用VMware在Ubuntu20.04下通过VScode搭建docker环境，可通过远程连接容器进行开发。（仅…

大数据 2023年5月27日
0095
MYSQL日期格式化

1、时间格式化成固定格式字符串 SELECT DATE_FORMAT(NOW(),’%Y-%m-%d %H:%i:%s’) 2、获取时间戳 select UNIX_TIMESTAM…

大数据 2023年6月3日
0079
我教你如何解决 Docker 下载 mcr.microsoft.com 镜像慢的办法

我教你如何解决 Docker 下载 mcr.microsoft.com 镜像慢的办法一、介绍最近，我在写有关使用 Jenkins 搭建企业级持续集成环境的文章，准备了四台服务器…

大数据 2023年5月29日
00101
2022-10-06(nginx解析漏洞、Redis 4.x/5.x 未授权访问漏洞、Shiro 默认密钥、rsync未授权访问)

大数据 2023年11月16日
0038

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pytorch与深度学习自查手册3-模型定义

定义神经网络

权重初始化

方法1：net.apply(weights_init)

方法2：在网络初始化的时候进行参数初始化

常用的操作

利用nn.Parameter()设计新的层

nn.Flatten

nn.Sequential

常用的层

全连接层nn.Linear()

卷积 torch.nn.ConvNd()

池化

最大池化 torch.nn.MaxPoolNd()

均值池化 torch.nn.AvgPoolNd()

反池化

最大值反池化 nn.MaxUnpool2d()

组合池化

正则化层

BatchNorm

LayerNorm

InstanceNorm

GroupNorm

激活函数

torch.nn.GELU

torch.nn.ELU ( alpha=1.0 , inplace=False )

torch.nn.LeakyReLU ( negative_slope=0.01 , inplace=False )

torch.nn.PReLU ( num_parameters=1 , init=0.25 )

torch.nn.ReLU ( inplace=False )

torch.nn.ReLU6 ( inplace=False )

torch.nn.SELU ( inplace=False )

torch.nn.CELU ( alpha=1.0 , inplace=False )

torch.nn.Sigmoid

torch.nn.LogSigmoid

torch.nn.Tanh

torch.nn.Tanhshrink

torch.nn.Softplus ( beta=1 , threshold=20 )

torch.nn.Softshrink ( lambd=0.5 )

nn.Softmax

参考资料

大家都在看

卷积 `torch.nn.ConvNd()`

最大池化 `torch.nn.MaxPoolNd()`

均值池化 `torch.nn.AvgPoolNd()`

最大值反池化 `nn.MaxUnpool2d()`

`torch.nn.GELU`

`torch.nn.ELU` ( alpha=1.0 , inplace=False )

`torch.nn.LeakyReLU` ( negative_slope=0.01 , inplace=False )

`torch.nn.PReLU` ( num_parameters=1 , init=0.25 )

`torch.nn.ReLU` ( inplace=False )

`torch.nn.ReLU6` ( inplace=False )

`torch.nn.SELU` ( inplace=False )

`torch.nn.CELU` ( alpha=1.0 , inplace=False )

`torch.nn.Sigmoid`

`torch.nn.LogSigmoid`

`torch.nn.Tanh`

`torch.nn.Tanhshrink`

`torch.nn.Softplus` ( beta=1 , threshold=20 )

`torch.nn.Softshrink` ( lambd=0.5 )