bert 的输出格式详解

2023年6月24日下午12:43 • 人工智能 • 阅读 88

输出是一个元组类型的数据，包含四部分，

last hidden state shape是(batch_size, sequence_length, hidden_size)，hidden_size=768,它是模型最后一层的隐藏状态

pooler_output：shape是(batch_size, hidden_size)，这是序列的第一个token (cls) 的最后一层的隐藏状态，它是由线性层和Tanh激活函数进一步处理的，这个输出不是对输入的语义内容的一个很好的总结，对于整个输入序列的隐藏状态序列的平均化或池化可以更好的表示一句话。

hidden_states：这是输出的一个可选项，如果输出，需要指定 config.output_hidden_states=True,它是一个元组，含有13个元素，第一个元素可以当做是embedding，其余12个元素是各层隐藏状态的输出，每个元素的形状是(batch_size, sequence_length, hidden_size)，

attentions：这也是输出的一个可选项，如果输出，需要指定 config.output_attentions=True,它也是一个元组，含有12个元素，包含每的层注意力权重，用于计算self-attention heads的加权平均值

import torch
from torch import tensor
from transformers import BertConfig, BertTokenizer, BertModel

model_path = 'model/chinese-roberta-wwm-ext/'#已下载的预训练模型文件路径
config = BertConfig.from_pretrained(model_path, output_hidden_states = True, output_attentions=True)
assert config.output_hidden_states == True
assert config.output_attentions == True
model = BertModel.from_pretrained(model_path, config = config)
tokenizer = BertTokenizer.from_pretrained(model_path)

text = '我热爱这个世界'

input = tokenizer(text)
{'input_ids': [101, 2769, 4178, 4263, 6821, 702, 686, 4518, 102],
#'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0],
#'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

input = tokenizer.encode(text)
[101, 2769, 4178, 4263, 6821, 702, 686, 4518, 102]

input = tokenizer.encode_plus(text)
{'input_ids': [101, 2769, 4178, 4263, 6821, 702, 686, 4518, 102],
#'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0],
#'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

input_ids = torch.tensor([tokenizer.encode(text)], dtype=torch.long)#一个输入也需要组batch
print(input_ids.shape)
#torch.Size([1, 9])

model.eval()
output = model(input_ids)
print(len(output))
print(output[0].shape) #最后一层的隐藏状态 （batch_size, sequence_length, hidden_size)
print(output[1].shape) #第一个token即（cls）最后一层的隐藏状态 (batch_size, hidden_size)
print(len(output[2])) #需要指定 output_hidden_states = True， 包含所有隐藏状态，第一个元素是embedding, 其余元素是各层的输出 (batch_size, sequence_length, hidden_size)
print(len(output[3])) #需要指定output_attentions=True，包含每一层的注意力权重，用于计算self-attention heads的加权平均值(batch_size, layer_nums, sequence_length, sequence_legth)
4
torch.Size([1, 9, 768])
torch.Size([1, 768])
13
12

all_hidden_state = output[2]
print(all_hidden_state[0].shape)
print(all_hidden_state[1].shape)
print(all_hidden_state[2].shape)
torch.Size([1, 9, 768])
torch.Size([1, 9, 768])
torch.Size([1, 9, 768])

attentions = output[3]
print(attentions[0].shape)
print(attentions[1].shape)
print(attentions[2].shape)
torch.Size([1, 12, 9, 9])
torch.Size([1, 12, 9, 9])
torch.Size([1, 12, 9, 9])

后续补充，

text = '我热爱这个世界'

input = tokenizer(text)
#input分词后是一个字典
{'input_ids': [101, 2769, 4178, 4263, 6821, 702, 686, 4518, 102],
#'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0],
#'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

#input_ids = torch.tensor([tokenizer.encode(text)], dtype=torch.long)
#一个输入也需要组batch

input_ids = torch.tensor([input["input_ids"])
token_type_ids = torch.tensor([input["token_type_ids"])
attention_mask = torch.tensor([input["attention_mask"]]

output = model(input_ids, token_type_ids, attention_mask)

可以同时输入input_ids token_type_ids 和 attention_mask得到输出

#另一种写法，直接在分词的过程中返回张量
input_tensor = tokenizer(input, return_tensors = "pt")

Original: https://blog.csdn.net/qq_41971355/article/details/121124868
Author: uan_cs
Title: bert 的输出格式详解

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/649073/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

yolov5只训练数据集中的某几个类别

文章目录前言一、直接修改数据集标签二、修改加载labels的代码 * 1.train 2.create_dataloader 3.LoadImagesAndLabels 4….

人工智能 2023年7月21日
0057
Mac/Linux上编译支持视频输入的YOLOv3程序

一、支持图片检测的编译步骤(官网) 图片检测只需要按照页面提示编译即可。 git clone https://github.com/pjreddie/darknet cd dark…

人工智能 2023年7月20日
0055
【目标检测】(8) ASPP改进加强特征提取模块，附Tensorflow完整代码

各位同学好，最近想改进一下YOLOV4的SPP加强特征提取模块，看到很多论文中都使用语义分割中的ASPP模块来改进，今天用Tensorflow复现一下代码。 YOLOV4的主干网…

人工智能 2023年6月17日
0086
目标检测评价指标

参考：视频讲解文章目录前言一、评价指标的定义二、评价指标的计算 * 1.前言中三个方面去考虑 2.计算Precision、Recall 三、COCO 评价参考前言学习记…

人工智能 2023年7月10日
0055
基于四旋翼无人机的PD控制研究（Matlab代码实现）

💥💥💞💞 欢迎来到本博客❤️❤️💥💥 🏆博主优势： 🌞🌞🌞博客内容尽量做到思维缜密，逻辑清晰，为了方便读者。 ⛳️ 座右铭：行百里者，半于九十。目录💥1 概述📚2 运行结果🎉3 参…

人工智能 2023年6月30日
0084
如何处理数据的缺失值和异常值

问题描述在数据分析和机器学习任务中，经常会遇到数据中存在缺失值和异常值的情况。缺失值指的是一些变量在某些样本/观测中没有被观测到或者获取到值的情况，而异常值则是指与其他值明显不同…

人工智能 2023年12月31日
0082
实时目标追踪：ByteTrack算法步骤详解和代码逐行解析

ByteTrack算法简介 ByteTrack算法是一种基于目标检测的追踪算法，和其他非ReID的算法一样，仅仅使用目标追踪所得到的bbox进行追踪。追踪算法使用了卡尔曼滤波预测…

人工智能 2023年7月4日
0066
2022高教社杯数学建模国赛C题思路代码实现

1.比赛报名与思路解析（持续更新750967193） 2.比赛时间：2022年9月15日18点到2022年9月18日20点如下为C题思路的配套代码：首先导入表单： import…

人工智能 2023年7月7日
0051
机器学习：回归模型的评价指标

概述对于回归而言，模型性能的好坏主要体现在拟合的曲线与真实曲线的误差。主要的评价指标包括：拟合优度/R-Squared，校正决定系数(Adjusted R-square)，均方误…

人工智能 2023年6月17日
0086
如何使用evo工具评估LeGO-LOAM跑KITTI数据集的结果

如何使用evo工具评估LeGO-LOAM跑KITTI数据集的结果下载KITTI数据集安装kitti2bag 修改LeGO-LOAM代码 * utility.h imagePro…

人工智能 2023年6月1日
0077
Xavier NX 使用OpenCV+GStreamer实现硬解码

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、NVCODEC是什么？二、编译OpenCV * 1.准备环境 2.编译 3.测试代码总结 …

人工智能 2023年6月24日
0074
神经网络绘图软件推荐合集

神经网络绘图软件推荐合集 * – + 1. NN SVG（支持在线） + * 1.1 FCNN style * 1.2 LeNet style * 1.3 AlexNe…

人工智能 2023年6月16日
0085
Python图像处理，cv2模块，OpenCV实现边缘检测

前言利用Python实现OpenCV实现边缘检测。废话不多说。让我们愉快地开始吧~ 开发工具 Python版本： 3.6.4 相关模块： cv2模块； numpy模块；以及一…

人工智能 2023年6月22日
0081
基于yolov3的目标检测与LPRnet字符识别的车牌识别（CCPD2020新能源车牌数据集）

文章目录前言一、程序思路二、使用步骤 * 1.配置环境 2.文件结构 3.准备数据集 4.训练有问题欢迎指正前言项目放这：车牌识别验证集70%多的识别率吧，效果不算特…

人工智能 2023年6月16日
0097
近端策略优化（PPO）

Proximal Policy Optimization（PPO）一.同策略和异策略如果要学习的智能体和与环境交互的智能体是相同的，我们称之为同策略。如果要学习的智能体和与环境…

人工智能 2023年6月19日
0076
Python基础与大数据应用（三）

1、大数据最基本特点数据体量巨大。 2、pandas 是 Python的一个数据分析包，主要有 Series 和 DataFrame 两种类型的数据对象。 3、字典中的键可以是…

人工智能 2023年6月11日
0053

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

bert 的输出格式详解

大家都在看