神经网络语言模型
1.NNLM的原理
1.1 语言模型
- 假设 S_表示某个有意义的句子,由一串特定顺序排列的词w 1 , w 2 , . . , w n w_1,w_2,..,w_n w 1 ,w 2 ,..,w n 组成, _n_是句子的长度。目的:计算 _S_在文本中(语料库)出现的可能性 _P(S)。
; 1.2 神经网络语言模型
- 直接从语言模型出发,将模型最优化过程转化为求词向量表示的过程.
2. NNLM的网络结构
2.1 NNLM的结构图
- NNLM网络结构包括输入层、投影层,隐藏层和输出层
; 2.2 NNLM的计算过程
- 根据前面的n-1个单词,预测第n个单词的概率
2.3 环境
python3.7
torch==1.8.0
2.4 步骤
步骤一:读取数据
def load_data():
sentences = ['i like dog', 'i love coffee', 'i hate milk']
word_list = " ".join(sentences).split()
word_list = list(set(word_list))
word_dict = {w: i for i, w in enumerate(word_list)}
number_dict = {i: w for i, w in enumerate(word_list)}
return word_dict, number_dict,sentences
步骤二:实现mini-batch迭代器
def make_batch(sentences):
input_batch = []
target_batch = []
for sen in sentences:
word = sen.split()
input = [word_dict[n] for n in word[:-1]]
target = word_dict[word[-1]]
input_batch.append(input)
target_batch.append(target)
return input_batch, target_batch
步骤三:超参数设置和mini-batch组装
dtype = torch.FloatTensor
n_class = len(word_dict)
n_step = len(sentences[0].split()) - 1
n_hidden = 2
m = 2
input_batch, target_batch = make_batch(sentences)
input_batch = torch.LongTensor(input_batch)
target_batch = torch.LongTensor(target_batch)
dataset = Data.TensorDataset(input_batch, target_batch)
loader = Data.DataLoader(dataset=dataset, batch_size=16, shuffle=True)
步骤四:模型构建
class NNLM(nn.Module):
def __init__(self):
"""
C: 词向量,大小为|V|*m的矩阵
H: 隐藏层的weight
W: 输入层到输出层的weight
d: 隐藏层的bias
U: 输出层的weight
b: 输出层的bias
1. 首先将输入的 n-1 个单词索引转为词向量,然后将这 n-1 个词向量进行 concat,形成一个 (n-1)*w 的向量,用 X 表示
2. 将 X 送入隐藏层进行计算,hidden = tanh(d + X * H) [3,4] * [4 * 2]
3. 输出层共有|V|个节点,每个节点yi表示预测下一个单词i的概率,y的计算公式为y = b + X * W + hidden * U
n_step: 文中用n_step个词预测下一个词,在本程序中其值为2
n_hidden: 隐藏层(中间那一层)神经元的数量
m: 词向量的维度
"""
super(NNLM, self).__init__()
self.C = nn.Embedding(n_class, m)
self.H = nn.Parameter(torch.randn(n_step * m, n_hidden).type(dtype))
self.W = nn.Parameter(torch.randn(n_step * m, n_class).type(dtype))
self.d = nn.Parameter(torch.randn(n_hidden).type(dtype))
self.U = nn.Parameter(torch.randn(n_hidden, n_class).type(dtype))
self.b = nn.Parameter(torch.randn(n_class).type(dtype))
print("---")
def forward(self, X):
"""
X: [batch_size, n_step]
"""
X = self.C(X)
X = X.view(-1, n_step * m)
hidden_out = torch.tanh(self.d + torch.mm(X, self.H))
output = self.b + torch.mm(X, self.W) + torch.mm(hidden_out, self.U)
return output
步骤五:实例化模型和预测
model = NNLM()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
for epoch in range(5000):
for batch_x, batch_y in loader:
optimizer.zero_grad()
output = model(batch_x)
loss = criterion(output, batch_y)
if (epoch + 1) % 1000 == 0:
print('Epoch:', '%04d' % (epoch + 1), 'cost = ', '{:.6f}'.format(loss))
loss.backward()
optimizer.step()
predict = model(input_batch).data.max(1, keepdim=True)[1]
print([sen.split()[:n_step] for sen in sentences], '->', [number_dict[n.item()] for n in predict.squeeze()])
2.6 运行结果
`
运行结果:
Original: https://blog.csdn.net/GUANGZHAN/article/details/121614095
Author: 智享AI
Title: 十二、神经网络语言模型
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/532073/
转载文章受原作者版权保护。转载请注明原作者出处!