【PyTorch】常用的神经网络层汇总（持续补充更新）

2023年6月4日下午1:10 • 人工智能 • 阅读 114

1. Convolution Layers

1.1 nn.Conv2d

（1）原型

python;gutter:true; torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None) <pre><code> 在由多个输入平面组成的输入信号上应用2D卷积，简言之就是在多通道输入图像上进行卷积操作。（2）参数 * in_channe _ls (_ _int_ _)_— 输入图像的通道数 * out_channels _(_ _int)_ — 输出图像（张量表示）的通道数 * kernel_size _(_ _int or tuple)_— 卷积核大小。n*n型的写成kernel_size = 5 即可，n*m型的则需要写成 kernel_size = (n, m) * stride _(_ _int or tuple, 可选择)_— 卷积步长，即卷积核在图像上每次平移的间隔。默认：1 * padding _(_ _int, tuple or str, 可选择)_— 边缘填充，图像上下左右四边填充为 0 的行数和列数。默认：0 * padding_mode _(_ _string, 可选择)_— padding的模式：'zeros'，'reflect'，'replicata' 或'circular'。默认：'zeros' * dilatio n _(_ _int or tuple, 可选择)_— 内核元素的间隔，该参数决定了是否采用空洞卷积。默认：1（不采用） * groups _(_ _int, 可选择)_— 输入通道到输出通道之间块状连接的数量。默认：1 * bias _(_ _bool, 可选择)_— 是否增加一个可学习的偏置项到输出。默认：True （3）属性 * ~Linear.weight _(torch.Tensor)_— 形状为(out_channels, in_channels / group, kernel_size[0], kernel_size[1]) 的模型的可学习的偏置项初始化为： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220430191120661-571963578.png) * ~Linear.bias— 形状为(out_channels) 的模型的可学习的偏置项如果bias为True，初始化为： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220430191059235-1225202227.png) （4）用法示例;gutter:true;
import torch
import torch.nn as nn

m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
print(m)
(N, C, H, W)
inputImage = torch.randn(20, 16, 50, 100)
output = m(inputImage)
print(output.shape)

结果：

2. Pooling Layers

2.1 nn.MaxPool2d

（1）原型

python;gutter:true; torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False) <pre><code> 在由多个输入平面组成的输入信号上应用 2D 最大池化。 **注：**当 ceil_mode=True 时，如果滑动窗口从左侧填充或输入中开始，则允许它们越界。在右侧填充区域开始的滑动窗口将被忽略。（2）参数 * kernel_size— 表示做最大池化的窗口大小，可以是单个值，也可以是tuple元组 * stride— 卷积步长，即卷积核在图像上每次平移的间隔。默认：kernel_size * padding— 图像上下左右四边填充为 0 的行数和列数。默认：0 * dilation— 内核元素的间隔，该参数决定了是否采用空洞卷积。默认：1（不采用） * return_indices _(bool)_— 是否返回输出的最大索引。默认：False * ceil_mode _(bool)_— 使用向上取整（ceil）或向下取整（floor）的方式计算得到输出形状。默认：False（floor，向下取整）（3）用法示例;gutter:true;
pool of square window of size=3, stride=2
m = nn.MaxPool2d(3, stride=2)
pool of non-square window
m = nn.MaxPool2d((3, 2), stride=(2, 1))
input = torch.randn(20, 16, 50, 32)
output = m(input)
print(f’input shape: {input.shape}’, f’output shape: {output.shape}’, sep=’\n’)

结果：

2.2 nn.AvgPool2d

（1）原型

python;gutter:true; torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None) <pre><code> 在由多个输入平面组成的输入信号上应用 2D 平均池化。 **注：**当 ceil_mode=True 时，如果滑动窗口从左侧填充或输入中开始，则允许它们越界。在右侧填充区域开始的滑动窗口将被忽略。（2）参数 * kernel_size— 表示做最大池化的窗口大小，可以是单个值，也可以是tuple元组 * stride— 卷积步长，即卷积核在图像上每次平移的间隔。默认：kernel_size * padding— 图像上下左右四边填充为 0 的行数和列数。默认：0 * ceil_mode— 使用向上取整（ceil）或向下取整（floor）的方式计算得到输出形状。默认：False（floor，向下取整） * count_include_pad— 是否在平均计算中包含零填充。默认：True * divisor_override— 如果指定，它将用作除数，否则将使用池化区域的大小（3）用法示例;gutter:true;
pool of square window of size=3, stride=2
m = nn.AvgPool2d(3, stride=2)
pool of non-square window
m = nn.AvgPool2d((3, 2), stride=(2, 1))
input = torch.randn(20, 16, 50, 32)
output = m(input)
print(f’\ninput shape: {input.shape}’, f’output shape: {output.shape}’, sep=’\n’)

2.3 AdaptiveMaxPool2d

（1）原型

python;gutter:true; torch.nn.AdaptiveMaxPool2d(output_size, return_indices=False) <pre><code> 在由多个输入平面组成的输入信号上应用 2D 自适应最大池化。对于任何输入大小，输出大小为 Hout * Wout，输出特征的数量等于输入平面的数量。（2）参数 * output_size — 目标输出为形如 Hout * Wout 的图像。可能是一个数组 (Hout, Wout) 或者方形图像 Hout * Hout的单项 Hout 。Hout 和 Wout 可以是 int，也可以是 None，这意味着大小将与输入的大小相同 * return_indices — 是否返回输出的最大索引。默认：False （3）用法示例;gutter:true;
target output size of 5×7
m = nn.AdaptiveMaxPool2d((5, 7))
input = torch.randn(1, 64, 8, 9)
output = m(input)
print(f’input shape: {input.shape}’, f’output shape: {output.shape}’, sep=’\n’)
target output size of 7×7 (square)
m = nn.AdaptiveMaxPool2d(7)
input = torch.randn(1, 64, 10, 9)
output = m(input)
print(f’\ninput shape: {input.shape}’, f’output shape: {output.shape}’, sep=’\n’)
target output size of 10×7
m = nn.AdaptiveMaxPool2d((None, 7))
input = torch.randn(1, 64, 10, 9)
output = m(input)
print(f’\ninput shape: {input.shape}’, f’output shape: {output.shape}’, sep=’\n’)

结果：

2.4 AdaptiveAvgPool2d

（1）原型

python;gutter:true; torch.nn.AdaptiveAvgPool2d(output_size) <pre><code> 在由多个输入平面组成的输入信号上应用 2D 自适应平均池化。对于任何输入大小，输出大小为 H x W。输出特征的数量等于输入平面的数量。（2）参数 * output_size – 目标输出为形如 H * W 的图像。可能是一个数组 (H, W) 或者方形图像 H * H的单项 H 。H 和 W 可以是 int，也可以是 None，这意味着大小将与输入的大小相同（3）用法示例;gutter:true;
target output size of 5×7
m = nn.AdaptiveAvgPool2d((5, 7))
input = torch.randn(1, 64, 8, 9)
output = m(input)
print(f’input shape: {input.shape}’, f’output shape: {output.shape}’, sep=’\n’)
target output size of 7×7 (square)
m = nn.AdaptiveAvgPool2d(7)
input = torch.randn(1, 64, 10, 9)
output = m(input)
print(f’\ninput shape: {input.shape}’, f’output shape: {output.shape}’, sep=’\n’)
target output size of 10×7
m = nn.AdaptiveAvgPool2d((None, 7))
input = torch.randn(1, 64, 10, 9)
output = m(input)
print(f’\ninput shape: {input.shape}’, f’output shape: {output.shape}’, sep=’\n’)

3. Non-linear Activations (weighted sum, nonlinearity)

3.1 nn.Sigmoid

（1）原型

python;gutter:true; torch.nn.Sigmoid() <pre><code> （2）Sigmoid函数表达式及图像逐元素执行： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220428231407358-2069157741.png) ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220428231549134-308733377.png) （3）用法示例;gutter:true;
m = nn.Sigmoid()
input = torch.randn(2)
print(input)
output = m(input)
print(output)

结果：

3.2 nn.Tanh

（1）原型

python;gutter:true; torch.nn.Tanh() <pre><code> （2）Tanh函数表达式及图像逐元素执行： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220428233400206-198101038.png) ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220428233443482-34024356.png) （3）用法示例;gutter:true;
m = nn.Tanh()
input = torch.randn(2)
print(input)
output = m(input)
print(output)

结果：

3.3 nn.ReLU

（1）原型

python;gutter:true; torch.nn.ReLU(inplace=False) <pre><code> （2）ReLU函数表达式及图像逐元素执行： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220428233935344-607834307.png) ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220428234017093-1334997799.png) （3）参数 * inplace — 选择是否就地执行操作，即是否对Input本身执行该操作。若为True，则对Input执行ReLU的同时也会改变（或刷新）Input的值，使得Input=Output；若为False，则不会改变Input的值。默认：False （4）用法示例;gutter:true;
m = nn.ReLU()
input = torch.randn(2)
output = m(input)

An implementation of CReLU – https://arxiv.org/abs/1603.05201
m = nn.ReLU()
input = torch.randn(2).unsqueeze(0)
output = torch.cat((m(input),m(-input)))

结果：

3.4 nn.LeakyReLU

（1）定义

python;gutter:true; torch.nn.LeakyReLU(negative_slope=0.01, inplace=False) <pre><code> （2）LeakyReLU函数表达式和图像逐元素执行： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220428225559947-991799523.png) ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220428225128922-887436370.png) （3）参数 * negative_slope— Input的负数部分的斜率。默认：1e-2 * inplace— 选择是否就地执行操作，即是否对Input本身执行该操作。若为True，则对Input执行ReLU的同时也会改变（或刷新）Input的值，使得Input=Output；若为False，则不会改变Input的值。默认：False （4）用法示例;gutter:true;
m = nn.LeakyReLU(0.1)
input = torch.randn(2)
print(input)
output = m(input)
print(output)

结果：

3.5 nn.ReLU6

（1）原型

python;gutter:true; torch.nn.ReLU6(inplace=False) <pre><code> （2）ReLU6函数表达式及图像逐元素执行： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220429001516289-692718984.png) ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220429001540519-503398367.png) （3）参数 * inplace — 选择是否就地执行操作，即是否对Input本身执行该操作。若为True，则对Input执行ReLU的同时也会改变（或刷新）Input的值，使得Input=Output；若为False，则不会改变Input的值。默认：False （4）用法示例;gutter:true;
m = nn.ReLU6()
input = torch.randn(2)
output = m(input)
print(f’input shape: {input}’, f’output shape: {output}’, sep=’\n’)

结果：

3.6 nn.GeLU

（1）原型

python;gutter:true; torch.nn.GELU <pre><code> （2）GELU函数表达式及图像逐元素执行： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220430181755372-1137431789.png) ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220430181950865-610589424.png) ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220429002441006-290886587.png) （3）用法示例;gutter:true;
GeLU
m = nn.GELU()
input = torch.randn(2)
output = m(input)
print(‘input: ‘, input, ‘output: ‘, output, sep=’\n’)

结果：

3.7 nn.SeLU

（1）原型

python;gutter:true; torch.nn.SELU(inplace=False) <pre><code> 注：当使用kaiming_normal 或 kaiming_normal_ 进行初始化时，应使用 nonlinearity='linear'而不是 nonlinearity='selu'，以获得自归一化神经网络。更多细节详见 [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515) 一文。（2）SELU函数表达式及图像逐元素执行： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220430184425741-106593950.png) 式中：α=1.6732632423543772848170429916717，scale=1.0507009873554804934193349852946 ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220429002153890-1864241940.png) （3）参数 * inplace （ _bool_ _,_ _可选择_）— 选择是否就地执行操作，即是否对Input本身执行该操作。若为True，则对Input执行ReLU的同时也会改变（或刷新）Input的值，使得Input=Output；若为False，则不会改变Input的值。默认：False （4）用法示例;gutter:true;
SeLU
m = nn.SELU()
input = torch.randn(2)
output = m(input)
print(‘input: ‘, input, ‘output: ‘, output, sep=’\n’)

结果：

4. Non-linear Activations (other)

4.1 nn.Softmax

（1）原型

python;gutter:true; torch.nn.Softmax(dim=None) <pre><code> 将 Softmax 函数应用于 n 维的输入张量，改变他们的大小，使得 n 维输出张量的元素位于 [0, 1] 范围内，并且总和为0。当输入张量是稀疏张量时，未指定的值将被视为 -inf。需要注意的是，该模块不直接与 NLLLoss 一起使用，它期望在 Softmax 和自身之间计算 Log。改用 LogSoftmax（它更快并且具有更好的数值属性）（2）Softmax函数表达式 ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220429003010517-184436339.png) （3）参数 * dim (_int_) — 计算 Softmax 的维度（因此沿 dim 的每个切片总和为 1）。（4）用法示例;gutter:true;
Softmax
m = nn.Softmax(dim=1)
input = torch.randn(2, 3)
output = m(input)
print(‘input: ‘, input, ‘output: ‘, output, sep=’\n’)

结果：

5 . Normalization Layers

5.1 nn.BatchNorm2d

（1）原型

python;gutter:true; torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None) <pre><code> 对4D输入应用批量归一化（具有附加通道尺寸的小批量的2D输入）。详述可参考论文 [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . （2）参数 * num_features — 指特征数。一般情况下输入的数据格式为（batch_size，num_features，height，width）其中的C为特征数，也称channel数 * eps— 为了分数值稳定而添加到分母的值。默认：1e-5 * momentum— 用于运行过程中均值和方差的估计参数。可以将累积移动平均线（即简单平均线）设置为。默认：0.1 * affine— 此模块是否具有可学习的仿射参数。默认：True * track_running_stats— 一个布尔值，当设置为True时，此模块跟踪运行平均值和方差；设置为False时，此模块不跟踪此类统计信息，并将统计信息缓冲区running_mean和running_var初始化为None。当这些缓冲区为None时，此模块将始终使用批处理统计信息。在训练和评估模式下都可以。默认：True （3）用法示例;gutter:true;
With Learnable Parameters
m = nn.BatchNorm2d(100)
Without Learnable Parameters
m = nn.BatchNorm2d(100, affine=False)
input = torch.randn(20, 100, 35, 45)
output = m(input)

6. Linear Layers

6.1 nn.Linear

（1）原型

python;gutter:true; torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None) <pre><code> 对输入数据进行线性变换： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220430171836157-1989578789.png) （2）参数 * in_features— 每个输入样本的尺寸 * out_features — 每个输出样本的尺寸 * bias— 该层是否会学习一个额外的偏置项。默认：True * device— 代表将分配到设备的对象（'cpu' 或 'cuda'，cuda需设置设备编号，比如 'cuda:0'、'cuda:1' 等） * dtype— 代表数据类型（3）属性 * ~Linear.weight _(torch.Tensor)_— 形状为（out_features, in_features）的模型的可学习的偏置项初始化为： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220430180349181-911855095.png) * ~Linear.bias — 形状为out_features的模型的可学习的偏置项如果bias为True，初始化为： ![【PyTorch】常用的神经网络层汇总（持续补充更新）](https://johngo-pic.oss-cn-beijing.aliyuncs.com/articles/20230602/1734568-20220430180632550-593364648.png) （4）用法示例;gutter:true;
Linear
m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(f’output size: {output.size()}’, f’weight size: {m.weight.size()}’, f’bias size: {m.bias.size()}’, sep=’\n’)

结果：

7. Dropout Layers

7.1 nn.Dropout

（1）原型

python;gutter:true; torch.nn.Dropout(p=0.5, inplace=False) <pre><code> 在训练期间，使用来自伯努利分布的样本以概率 p 将输入张量的一些元素随机归零。每个通道将在每次前向调用时独立归零。 Dropout是一种用于正则化和防止神经元间互适应的有效技术，详细介绍可参考 [Improving neural networks by preventing co-adaptation of feature detectors](https://arxiv.org/abs/1207.0580) 一文。此外，输出在训练期间按1/(1-p)倍缩放。这意味着在评估期间，模块只计算一个恒等函数。输入可以是任意形状的张量，输出形状与输入保持一致。（2）参数 * p— 元素归零的概率。默认：0.5 * inplace— 选择是否就地执行操作，即是否对Input本身执行该操作。若为True，则对Input执行ReLU的同时也会改变（或刷新）Input的值，使得Input=Output；若为False，则不会改变Input的值。默认：False （3）用法示例;gutter:true;
Dropout
m = nn.Dropout(p=0.2)
input = torch.randn(4, 6)
output = m(input)
print(‘input:’, input, ‘output:’, output, sep=’\n’)

结果：

问：除了以 p=0.2 的概率随机归零的一些元素以外，其他元素的值为什么也变化了？

答：所有的元素都会在训练期间按 1/(1-p) 倍缩放。

7.2 nn.Dropout2d

（1）原型

python;gutter:true; torch.nn.Dropout2d(p=0.5, inplace=False) <pre><code> 不同于前一节的Dropout，Dropout2d是将整个通道随机归零（一个通道通常代表一个2D特征图，比如：在批量输入中，第 i 个样本的第 j 个通道是一个2D的张量 input[i, j]），使用来自伯努利分布的样本，每个通道将在每次前向调用中使用来自伯努利分布的样本以概率 p 独立清零。如论文 [Efficient Object Localization Using Convolutional Networks](https://arxiv.org/abs/1411.4280) 中所述，如果特征图中的相邻像素是强相关的（通常在早期卷积层中就是这种情况），那么 dropout 不会规范激活，否则只会导致有效的学习率降低。在这种情况下，nn.Dropout2d() 将有助于促进特征图之间的独立性，应改为使用。输入可以是 (N, C, H, W) 或 (C, H, W) ，输出和输入形状一致，也是(N, C, H, W) 或 (C, H, W) 。（2）参数 * p — 元素归零的概率。默认：0.5 * inplace — 选择是否就地执行操作，即是否对Input本身执行该操作。若为True，则对Input执行ReLU的同时也会改变（或刷新）Input的值，使得Input=Output；若为False，则不会改变Input的值。默认：False （3）用法示例;gutter:true;
Dropout
m = nn.Dropout(p=0.2)
input = torch.randn(4, 6)
output = m(input)
print(‘input:’, input, ‘output:’, output, sep=’\n’)

结果：

8. Loss Functions

8.1 nn.L1Loss

（1）原型

python;gutter:true; torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean') <pre><code> 计算输入 x 和目标 y 之间的平均绝对误差（MAE）。x可以是任意维度的张量；y形状和x保持一致。（2）公式（3）参数 * size_average _(bool, 可选参数)_ — 已弃用（参考reduction参数）。默认情况下，损失是批次中每个损失元素的平均值。请注意，对于某些损失，每个样本有多个元素。如果字段 size_average 设置为 False，则将每个 minibatch 的损失相加。当 reduce 为 False 时忽略。默认：True * reduce _(bool, 可选参数)_ — 已弃用（参考reduction参数）。默认情况下，损失会根据 size_average 对每个小批量的观测值进行平均或求和。当 reduce 为 False 时，返回每个批次元素的损失并忽略 size_average。默认：True * reduction _(string, 可选参数)_ — 指定要用于输出的reduction取值：none，mean，sum。none：不使用reduction；mean：输出的总和将除以输出中的元素数；sum：输出将被求和。注意：size_average 和 reduce 正在被弃用，同时，指定这两个参数中的任何一个都将覆盖 reduction。默认：mean （4）用法示例;gutter:true;
L1Loss
loss = nn.L1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
print(input, target, output, sep=’\n’)

结果：

8.2 nn.MSELoss（L2Loss）

（1）原型

python;gutter:true; torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean') <pre><code> 计算输入 x 和目标 y 中的每个元素之间的均方误差（平方 L2 范数）（2）公式（3）参数 * size_average _(bool, 可选参数)_ — 已弃用（参考reduction参数）。默认情况下，损失是批次中每个损失元素的平均值。请注意，对于某些损失，每个样本有多个元素。如果字段 size_average 设置为 False，则将每个 minibatch 的损失相加。当 reduce 为 False 时忽略。默认：True * reduce _(bool, 可选参数)_— 已弃用（参考reduction参数）。默认情况下，损失会根据 size_average 对每个小批量的观测值进行平均或求和。当 reduce 为 False 时，返回每个批次元素的损失并忽略 size_average。默认：True * reduction _(string, 可选参数)_— 指定要用于输出的reduction取值：none，mean，sum。none：不使用 reduction；mean：输出的总和将除以输出中的元素数；sum：输出将被求和。注意：size_average 和 reduce 正在被弃用，同时，指定这两个参数中的任何一个都将覆盖 reduction。默认：mean （4）用法示例;gutter:true;
MSELoss(L2Loss)
loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()
print(input, target, output, sep=’\n’)

结果：

8.3 nn.CrossEntropyLoss

（1）原型

python;gutter:true; torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0) <pre><code> 计算输入 x 和目标 y 之间的交叉熵损失（2）公式（3）参数 * weight _(Tensor, 可选参数)_ – 手动重新调整每个类别的权重。如果给定，则必须是大小为 C 的张量 * size_average _(bool, 可选参数)_ – 已弃用（参考reduction参数）。默认情况下，损失是批次中每个损失元素的平均值。请注意，对于某些损失，每个样本有多个元素。如果字段 size_average 设置为 False，则将每个 minibatch 的损失相加。当 reduce 为 False 时忽略。默认：True * ignore_index _(int, 可选参数)_ – 指定一个被忽略且不影响输入梯度的目标值。当 size_average 为 True 时，损失在非忽略目标上进行平均。请注意，ignore_index 仅适用于目标包含类索引时。 * reduce _(bool, 可选参数)_ – 已弃用（参考reduction参数）。默认情况下，损失会根据 size_average 对每个小批量的观测值进行平均或求和。当 reduce 为 False 时，返回每个批次元素的损失并忽略 size_average。默认：True * reduction _(string, 可选参数)_ – 指定要用于输出的reduction取值：none，mean，sum。none：不使用reduction；mean：取输出的加权平均值；sum：输出将被求和。注意：size_average 和 reduce 正在被弃用，同时，指定这两个参数中的任何一个都将覆盖 reduction。默认：mean * label_smoothing _(float, 可选参数)_ – [0.0, 1.0]之间的浮点数。指定计算损失时的平滑量，其中0.0表示不平滑。target变成了原本ground truth和 [Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/abs/1512.00567)一文中所述的均匀分布的组合。默认值：0.0 （4）用法示例;gutter:true;
Example of target with class indices
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
print(input, target, output, sep=’\n’)

print(”)

Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
output.backward()
print(input, target, output, sep=’\n’)

结果：

参考资料

1、PyTorch官方文档

2、【Pytorch系列】nn.BatchNorm2d用法详解

Original: https://www.cnblogs.com/shaoxx333/p/16199309.html
Author: 最菜程序员Sxx
Title: 【PyTorch】常用的神经网络层汇总（持续补充更新）

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/567998/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

知识图谱与数据库技术：RDF三元组库和Neo4j图数据库

随着知识图谱规模的日益增长，知识图谱数据管理问题愈加突出。近年来，知识图谱和数据库领域均认识到大规模知识图谱数据管理任务的紧迫性。由于传统关系数据库无法有效适应知识图谱的图数据模…

人工智能 2023年6月1日
0076
图片、视频超分模型RealBasicVSR安装使用|机器学习

目录前言安装 1、虚拟环境创建 2、安装pytorch 3、安装openmim 4、安装mmcv-full 5、安装mmedit 6、安装scipy 7、模型下载使用图片超…

人工智能 2023年7月23日
0081
免费试用！人人都能轻松上手的AI绘画工具，新版本升级

AIGC（AI Generated Content，即人工智能生产内容）赛道，正在以一种新的方式出圈——AI绘画。继飞盘、露营之后，年轻人又爱上了AI绘画，用这项新技术&#822…

人工智能 2023年7月29日
0053
角度归一化实现

在学习资料满天飞的大环境下，知识变得非常零散，体系化的知识并不多，这就导致很多人每天都努力学习到感动自己，最终却收效甚微，甚至放弃学习。我的使命就是过滤掉大量的无效信息，将知识体系…

人工智能 2023年6月2日
00101
Proximal Policy Optimization(PPO)算法实现gym连续动作空间任务Pendulum-v0（pytorch）

目录 1.ppo算法概述 2.Pendulum-v0 3.代码实现 1.ppo算法概述 PG算法视频参考李宏毅强化学习课程：李宏毅深度强化学习(国语)课程(2018)_哔哩哔哩_…

人工智能 2023年7月12日
0057
第二十一天多米诺和托米诺平铺

多米诺和托米诺平铺问题描述：有两种形状的瓷砖：一种是 2 x 1 的多米诺形，另一种是形如 “L” 的托米诺形。两种形状都可以旋转。给定整数 n ，返回…

人工智能 2023年6月27日
0067
双目立体视觉(一) 基本原理和步骤

目录一、双目立体视觉系统的四个基本步骤二、各步骤原理 1、相机标定 2、立体校正 3、立体匹配一、双目立体视觉系统的四个基本步骤相机标定主要包含两部分内容: 单相机的内参标…

人工智能 2023年6月18日
0057
python-生成数据

文章目录 1 绘制简单折线图 * 1.1 绘制简单的折线图 1.2 修改图表 1.3 校正图形 1.4 使用内置样式 2 绘制散点图 * 2.1 使用scatter()绘制散点图并…

人工智能 2023年7月16日
0060
清华镜像安装TensorFlow2.20(windows环境)

一、安装Anaconda 二、进入Anaconda Prompt 三、指令操作 conda config –add channels https://mirrors.t…

人工智能 2023年5月25日
0050
K折交叉验证

首先要讲的就是k折交叉验证的目的（即为什么要用k折交叉验证？）根本原因：数据有限，单一的把数据都用来做训练模型，容易导致过拟合。（反过来，如果数据足够多，完全可以不使用交叉验证。）…

人工智能 2023年6月15日
0065
python中drop用法_Python drop方法删除列之inplace参数实例

drop方法有一个可选参数inplace，表明可对原数组作出修改并返回一个新数组。不管参数默认为False还是设置为True，原数组的内存值是不会改变的，区别在于原数组的内容是否直…

人工智能 2023年7月8日
0070
[因果推断] 倾向得分Propensity Score 原理(二)

目录一前置知识干预效果 Treatment Effect Randomized Controlled Trials（RCT） Observational Studies AT…

人工智能 2023年6月25日
0081
Scrapy框架介绍

目录 1.介绍 2.为什么要用scrapy 3.scrapy的特点 4.优点 5.流程图 1.介绍 1）scrapy是python开发的一个快速、高层次的屏幕抓取和web抓取框架，…

人工智能 2023年6月19日
0094
R构建分位数回归模型（Quantile Regression）

R构建分位数回归模型（Quantile Regression）数据集分位数回归模型 Original: https://blog.csdn.net/zhongkeyuancho…

人工智能 2023年6月18日
0050
ASR项目实战-后处理

这个环节要处理的重要特点是分词、断句、标点符号、大小写、数字格式规范化等。 [En] The important features to be dealt with in this…

人工智能 2023年5月27日
0076
可视化 | Pyecharts 单轴散点图（附完整代码）

文章目录 1. 示例数据 2. 基础散点图 3. 单轴散点图 4. 单轴叠加 5. 实例推荐阅读大家好，我是 👉【Python当打之年】本期主要利用pyecharts给大家展…

人工智能 2023年7月15日
0049

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

【PyTorch】常用的神经网络层汇总（持续补充更新）

1. Convolution Layers

1.1 nn.Conv2d

2. Pooling Layers

2.1 nn.MaxPool2d

2.2 nn.AvgPool2d

2.3 AdaptiveMaxPool2d

2.4 AdaptiveAvgPool2d

3. Non-linear Activations (weighted sum, nonlinearity)

3.1 nn.Sigmoid

3.2 nn.Tanh

3.3 nn.ReLU

3.4 nn.LeakyReLU

3.5 nn.ReLU6

3.6 nn.GeLU

3.7 nn.SeLU

4. Non-linear Activations (other)

4.1 nn.Softmax

5 . Normalization Layers

5.1 nn.BatchNorm2d

6. Linear Layers

6.1 nn.Linear

7. Dropout Layers

7.1 nn.Dropout

7.2 nn.Dropout2d

8. Loss Functions

8.1 nn.L1Loss

8.2 nn.MSELoss（L2Loss）

8.3 nn.CrossEntropyLoss

参考资料

大家都在看