pytorch中LSTM的输出的理解，以及batch_first=True or False的输出层的区别

2023年6月16日下午7:38 • 人工智能 • 阅读 85

还记得寒假，我也纠结过这个问题，当时好像弄清楚了，感觉没什么问题，然后最近上手又感觉有点懵逼，赶紧记下来，免得以后忘记。

网上搜了很多，但是好像没有简单易懂的例子。

首先，pytorch中LSTM的输出一般用到的是输出层和隐藏层这两个，另一个细胞状态，我没咋用过，就不讲了。
一般两种用法，要么将输出层全连接然后得出结果，要么用隐藏层全连接，然后得出结果，有学长说用隐藏层效果会好一点。两种用法应该都可以。如果网络只有一层的时候，用输出层和隐藏层是没有任何区别的，当网络层数大于1时，才会有区别。
举个例子就懂了：


import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import matplotlib.pyplot as plt

input_dim = 28
hidden_dim = 100
layer_dim = 1
output_dim = 10
BATCH_SIZE = 32
EPOCHS = 10

trainsets = datasets.MNIST(root="./data", train=True, download=True, transform=transforms.ToTensor())
testsets = datasets.MNIST(root="./data", train=False, transform=transforms.ToTensor())

train_loader = torch.utils.data.DataLoader(dataset=trainsets, batch_size=BATCH_SIZE, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=testsets, batch_size=BATCH_SIZE, shuffle=True)

class LSTM_Model(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTM_Model, self).__init__()
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)

        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):

        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)

        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_().to(device)

        lstm_out, (h_n, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        print("-" * 10)
        print("lstm_out.shape", lstm_out.shape)
        print("lstm_out[:, -1].shape", lstm_out[:, -1].shape)
        print("-" * 10)
        print("h_n.shape", h_n.shape)
        print("-" * 10)
        feature_map = torch.cat([h_n[i, :, :] for i in range(h_n.shape[0])], dim=-1)
        print("feature_map.shape", feature_map.shape)
        print("-" * 10)
        print("lstm_out[:, -1]", lstm_out[:, -1])
        print("-" * 10)
        print("feature_map", feature_map)
        print("-" * 10)
        out = self.fc(feature_map)
        print("out", out.shape)
        return out

model = LSTM_Model(input_dim, hidden_dim, layer_dim, output_dim)

device = torch.device("cuda:0" if torch.cuda.is_available() else 'cpu')

criterion = nn.CrossEntropyLoss()

learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

sequence_dim = 28
loss_list = []
accuracy_list = []
iteration_list = []

iter = 0
for epoch in range(EPOCHS):
    for i, (images, labels) in enumerate(train_loader):
        model.train()
        images = images.view(-1, sequence_dim, input_dim).requires_grad_().to(device)
        labels = labels.to(device)
        optimizer.zero_grad()

        outputs = model(images)

        loss = criterion(outputs, labels)

        loss.backward()

        optimizer.step()

        iter += 1

        if iter % 500 == 0:
            model.eval()

            correct = 0.0
            total = 0.0

            for images, labels in test_loader:
                images = images.view(-1, sequence_dim, input_dim).to(device)

                outputs = model(images)

                predict = torch.max(outputs.data, 1)[1]

                total += labels.size(0)

                if torch.cuda.is_available():
                    correct += (predict.gpu() == labels.gpu()).sum()
                else:
                    correct += (predict == labels).sum()

            accuracy = correct / total * 100
            loss_list.append(loss.data)
            accuracy_list.append(accuracy)
            iteration_list.append(iter)

            print("loos:{}, Loss:{}, Accuracy:{}".format(iter, loss.item(), accuracy))

plt.plot(iteration_list, loss_list)
plt.xlabel("Number of Iteration")
plt.ylabel("Loss")
plt.title("LSTM")
plt.show()

plt.plot(iteration_list, accuracy_list, color='r')
plt.xlabel("Number of iteration")
plt.ylabel("Accuracy")
plt.title("LSTM")
plt.show()

分析：
input_dim = 28 # 输入步长(seq_len)
hidden_dim = 100 # 隐层的维度(hidden_size)
layer_dim = 1 # 1层LSTM(num_layers)
output_dim = 10 # 输出维度
BATCH_SIZE = 32 # 每批读取的(batch_size)

lstm_out：【batch_size, seq_len, hidden_size * num_directions】
lstm_hn:【num_directions * num_layers, batch_size, hidden_size】

pytorch中LSTM的输出的理解，以及batch_first=True or False的输出层的区别

此时我们看到，输出层的最后一步的shape和拼接后的隐藏层的shape是一样的，我们可以打印出具体的数值。

可以看到，这两层的值是一样的，因此当网络为1层的时候输出层和隐藏层的结果没有任何区别。

最终输出是【batch_size, num_classes】,应该没什么疑问，因为是十分类任务。

但是当网络超过一层时，可以把上面的代码将层数直接改为2，不出意外应该会报错。

分析结果可以发现，隐藏层的结果比输出层的结果多了一层，同时观察打印出来的结果，可以发现，隐藏层不仅有输出层的信息，还有前一层隐藏层的信息，即隐藏层最终的信息是大于输出层的，因此，说隐藏层效果会好一点也很容易解释，因为包含的信息更多了。

batch_first=True or False输出有什么区别？

首先，LSTM默认batch_first=False，即默认batch_size这个维度是在数据维度的中间的那个维度，即喂入的数据为【seq_len, batch_size, hidden_size】这样的格式。此时
lstm_out：【seq_len, batch_size, hidden_size * num_directions】
lstm_hn:【num_directions * num_layers, batch_size, hidden_size】

当设置batch_first=True时，喂入的数据就为【batch_size, seq_len, hidden_size】这样的格式。此时
lstm_out:【 batch_size, seq_len, hidden_size * num_directions】
lstm_hn:【num_directions * num_layers, batch_size, hidden_size】
举个例子就懂了
batch_first=False的情况：

import torch
import torch.nn as nn

class my_config():
    max_length = 20
    batch_size = 64
    embedding_size = 256
    hidden_size = 128
    num_layers = 2
    dropout = 0.5
    output_size = 2
    lr = 0.001
    epoch = 5

class myLSTM(nn.Module):
    def __init__(self, vocab_size, config: my_config):
        super(myLSTM, self).__init__()
        self.vocab_size = vocab_size
        self.config = config

        self.embeddings = nn.Embedding(vocab_size, self.config.embedding_size)
        self.lstm = nn.LSTM(
            input_size=self.config.embedding_size,
            hidden_size=self.config.hidden_size,
            num_layers=self.config.num_layers,
            dropout=self.config.dropout,
            bidirectional=True
        )

        self.dropout = nn.Dropout(self.config.dropout)
        self.fc = nn.Linear(
            self.config.num_layers * self.config.hidden_size * 2,
            self.config.output_size
        )

        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):
        embedded = self.embeddings(x)
        lstm_out, (h_n, c_n) = self.lstm(embedded)
        print("-" * 10)
        print("lstm_out.shape", lstm_out.shape)
        print("lstm_out[-1].shape", lstm_out[-1].shape)
        print("-" * 10)
        print("h_n.shape", h_n.shape)
        print("-" * 10)
        feature = self.dropout(h_n)
        print("feature.shape", feature.shape)
        print("-" * 10)
        feature_map = torch.cat([feature[i, :, :] for i in range(feature.shape[0])], dim=-1)
        print("feature_map.shape", feature_map.shape)
        print("-" * 10)
        out = self.fc(feature_map)
        return self.softmax(out)

if __name__ == '__main__':
    config = my_config()
    model = myLSTM(300, config)
    x = torch.randint(0, 100, [20, 64])
    out = model(x)
    print("out.shape", out.shape)

lstm_out：【seq_len, batch_size, hidden_size * num_directions】

batch_first=True的情况：

import torch
import torch.nn as nn

class my_config():
    max_length = 20
    batch_size = 64
    embedding_size = 256
    hidden_size = 128
    num_layers = 2
    dropout = 0.5
    output_size = 2
    lr = 0.001
    epoch = 5

class myLSTM(nn.Module):
    def __init__(self, vocab_size, config: my_config):
        super(myLSTM, self).__init__()
        self.vocab_size = vocab_size
        self.config = config

        self.embeddings = nn.Embedding(vocab_size, self.config.embedding_size)
        self.lstm = nn.LSTM(
            batch_first=True,
            input_size=self.config.embedding_size,
            hidden_size=self.config.hidden_size,
            num_layers=self.config.num_layers,
            dropout=self.config.dropout,
            bidirectional=True
        )

        self.dropout = nn.Dropout(self.config.dropout)
        self.fc = nn.Linear(
            self.config.num_layers * self.config.hidden_size * 2,
            self.config.output_size
        )

        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):
        embedded = self.embeddings(x)
        lstm_out, (h_n, c_n) = self.lstm(embedded)
        print("-" * 10)
        print("lstm_out.shape", lstm_out.shape)
        print("lstm_out[:, -1].shape", lstm_out[:, -1].shape)
        print("-" * 10)
        print("h_n.shape", h_n.shape)
        print("-" * 10)
        feature = self.dropout(h_n)
        print("feature.shape", feature.shape)
        print("-" * 10)
        feature_map = torch.cat([feature[i, :, :] for i in range(feature.shape[0])], dim=-1)
        print("feature_map.shape", feature_map.shape)
        print("-" * 10)
        out = self.fc(feature_map)
        return self.softmax(out)

if __name__ == '__main__':
    config = my_config()
    model = myLSTM(300, config)
    x = torch.randint(0, 100, [64, 20])
    out = model(x)
    print("out.shape", out.shape)

lstm_out:【 batch_size, seq_len, hidden_size * num_directions】
lstm_hn:【num_directions * num_layers, batch_size, hidden_size】

结论

通过上述实例可以发现，用隐藏层代替输出层进入全连接层理论上应该会有更好的效果，且模型batch_first=True or False只对输出层有影响，而对隐藏层的输出是没有影响的，因此，以后都用隐藏层来全连接比好，也不容易搞错。

Original: https://blog.csdn.net/qq_52785473/article/details/124368762
Author: Icy Hunter
Title: pytorch中LSTM的输出的理解，以及batch_first=True or False的输出层的区别

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/626304/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

推荐系统笔记（十七）：对超图、超图卷积、超图注意力的初步理解和应用（HyperGCN）

背景图神经网络在各个研究领域引起了广泛的关注并取得了显著的成绩。大多数算法都假设图中节点是成对出现的，即一条边只能连接两个节点。然而，在许多实际应用中，对象之间的关系是高阶的，超…

人工智能 2023年7月6日
0094
python-opencv利用cv2.matchShapes()实现轮廓匹配

之前用过cv2.matchTemplate()去做模板匹配，今天学习下cv2.matchShapes() 具体的使用方法,可以参考博客:https://www.cnblogs.co…

人工智能 2023年6月18日
0077
Python大数据分析与挖掘实战微课版答案 Python大数据分析与挖掘实战课后答案例题课后作业 python题目 python题库数据分析与挖掘题库数据分析与挖掘项目

（在此仅展示题目，所有数据、代码、答案、习题等点我头像，在资源中！！！）以下关于pandas 数据预处理说法正确的是（）。 _A、_pandas没有做哑变量的函数 _B、_在不导…

人工智能 2023年6月19日
0096
知识图谱：知识表示学习（KRL）/知识嵌入（KE）必读论文

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年6月1日
0066
机器学习中的无监督学习是什么？

什么是无监督学习？顾名思义，”无监督”学习发生在没有监督者或老师并且学习者自己学习的情况下。例如，考虑一个第一次看到并品尝到苹果的孩子。她记录了水果的颜…

人工智能 2023年6月16日
0076
YOLOv5+姿态估计HRnet与SimDR检测视频中的人体关键点

一、前言由于工程项目中需要对视频中的person进行关键点检测，我测试各个算法后，并没有采用比较应用化成熟的Openpose，决定采用检测精度更高的HRnet系列。但是由于官方给…

人工智能 2023年7月23日
0095
Anaconda、PyCharm、Tensorflow环境的配置及安装

第一步：安装 Anaconda 第二步：安装 PyCharm 第三步：测试 Anaconda 环境打开 cmd 命令窗口，输入以下命令： conda -V python -V c…

人工智能 2023年5月23日
0066
【深度学习】Pytorch实现CIFAR10图像分类任务测试集准确率达95%

文章目录 * – 前言 – CIFAR10简介 – Backbone选择 – 训练+测试 – + 训练环境及超参设置 +…

人工智能 2023年6月23日
0077
pytorch 预训练模型冻结层+添加层+不同层学习率调整——以resnet50为例

前言网上完整的教程比较少，很多讲的都是局部操作，比如：如何冻结层、如何添加层、如何调整不同层的学习率，一旦组合起来，就会发现总是有bug。我在这篇文章里，尽可能地把遇到的bug写…

人工智能 2023年7月23日
0067
k-means聚类（python代码）

k-means 聚类 [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service …

人工智能 2023年6月3日
0081
Python金融数据挖掘

一、随便说说？使用底层code构建（包、模块——模板）。实验是干啥的呀？我也不知道,yinweimeiyoutingke。 def 层层封装:1、欧式距离计算（手工计算、）2、…

人工智能 2023年7月18日
0042
关于父子组件传值中总结（watch， this.$refs的使用）

问题表述：组件化开发中经常用到父子组件的通信，如果父组件的数据是发请求从后端获取的异步数据，那么父组件将这个数据传递给子组件的时候。因为是异步数据，所以会出现父组件传递过去了，但…

人工智能 2023年6月28日
0088
Python使用pandas库进行数据清洗

对于清洗来说,没有绝对万能的通用模板,你会遇到各种问题,只能写好多种不同的版本根据不同的情况选择合适的程序来调用,比如初步清洗,二次清洗一.索引删除及基本统计运算 import …

人工智能 2023年7月7日
0088
命名实体识别（NER）综述【人名、地名、组织机构名、时间、日期、货币、百分比、专有名词】【标注体系：BIO（标注时用）、BIOES（训练时将BIO转为BIOES）】

确切来说命名实体识别是分词的子任务。由于命名实体数量不断动态增加，通常不可能在词典中穷尽列出，且其构成方法具有各自的一些规律性，因而，通常把对这些词的识别从分词任务中独立处理，称…

人工智能 2023年5月28日
0092
垃圾邮件分类任务中多种机器学习（贝叶斯、支持向量机和随机森林）和深度学习（GloVe和LSTM）方法的应用和对比

随着时代的发展，信息以指数形式增长，为了能够从海量信息中迅速找到所需要的信息，就需要对信息进行分类，因此自动文本分类技术应运而生。文本分类其任务是将自然语言文本根据其内容分为预先定…

人工智能 2023年7月1日
0064
大象声科在联想YOGA Pro 14s的方案概要和 intel GAN初见

在联想联想高阶旗舰品牌YOGA Pro 14s (YOGA Slim 9i)上搭载了大象声科Vocplus PC AI语音方案，据该方案这是一套集成在Intel GNA人工智慧加…

人工智能 2023年5月27日
00140

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

pytorch中LSTM的输出的理解，以及batch_first=True or False的输出层的区别

目录

大家都在看