【NLP】RNN理解(Pytorch实现)

【参考:【循环神经网络】5分钟搞懂RNN,3D动画深入浅出_哔哩哔哩_bilibili

【参考:RNN的Excel实现_哔哩哔哩_bilibili

参数理解

【NLP】RNN理解(Pytorch实现)
上面这张图最能表明RNN的运行原理

【参考:pytorch中RNN参数的详细解释_lwgkzl的博客-CSDN博客

自己的笔记 【参考:【PyTorch深度学习实践 | 刘二大人】B站视频教程笔记_myaijarvis的博客-CSDN博客

【参考:Pytorch实现RNN,LSTM和GRU超详细代码参数解析_adczsw的博客-CSDN博客

; 多层、双向RNN

【参考:循环神经网络的改进:多层RNN、双向RNN与预训练_DeepGeGe的博客-CSDN博客

【NLP】RNN理解(Pytorch实现)
【NLP】RNN理解(Pytorch实现)

; 手动实现

【参考:Pytorch中RNN入门思想及实现_瞳瞳瞳呀的博客-CSDN博客

【参考:【Pytorch官方教程】从零开始自己搭建RNN1 – 字母级RNN的分类任务_iteapoy的博客-CSDN博客

官方文档:https://pytorch.org/docs/stable/generated/torch.nn.RNN.html#torch.nn.RNN
RNN源码:https://pytorch.org/docs/stable/_modules/torch/nn/modules/rnn.html#RNN
【参考:RNN原理及逐行实现_哔哩哔哩_bilibili
【参考:29 – RNN的原理、API讲解及其逐行代码实现_取个名字真难呐的博客-CSDN博客

【NLP】RNN理解(Pytorch实现)

【参考:深度学习/RNN手动实现 · myaijarvis/AI – 码云 – 开源中国

; 单向单层


def custom_rnn_function(input, h_prev, w_ih, w_hh, b_ih, b_hh):
"""

    h_t = tanh(w_{ih}*x_t+b_{ih}+w_{hh}*h_{t-1}+b_{hh})
    x_t is the input at time t

    :param input: input(batch_size,seq_len,input_size)
    :param h_prev: previous hidden h_prev (batch_size,hidden_size)
    # 本来应该是(1,batch_size,hidden_size) 这里为了简便传递参数和下面的计算 因为很多时候传递的参数都是二维

    :param w_ih: weight w_ih (hidden_size,input_size)
    :param w_hh: weight w_hh (hidden_size,hidden_size)
    :param b_ih: bias b_ih (hidden_size)
    :param b_hh: bias b_hh (hidden_size)
    :return: output ,h_n

"""

    batch_size, seq_len, input_size = input.shape

    hidden_size, input_size = w_ih.shape

    output = torch.zeros(batch_size, seq_len, hidden_size)

    for t in range(seq_len):

        x_t = input[:, t, :].unsqueeze(2)

        w_ih_batch = w_ih.unsqueeze(0).tile(batch_size, 1, 1)

        w_hh_batch = w_hh.unsqueeze(0).tile(batch_size, 1, 1)

        w_ih_times_x = torch.bmm(w_ih_batch, x_t).squeeze(-1)

        w_hh_times_h = torch.bmm(w_hh_batch, h_prev.unsqueeze(2)).squeeze(-1)

        h_prev = torch.tanh(w_ih_times_x + b_ih + w_hh_times_h + b_hh)

        output[:, t, :] = h_prev

    return output, h_prev.unsqueeze(0)

双向单层


def bi_custom_rnn_function(input,
                           h_prev,
                           w_ih, w_hh, b_ih, b_hh,
                           w_ih_reverse, w_hh_reverse, b_ih_reverse, b_hh_reverse):

    batch_size, seq_len, input_size = input.shape
    hidden_size, input_size = w_ih.shape

    output = torch.zeros(batch_size, seq_len, hidden_size * 2)

    forward_output = custom_rnn_function(input,
                                         h_prev[0, :, :],
                                         w_ih, w_hh, b_ih, b_hh)[0]

    backward_output = custom_rnn_function(torch.flip(input, [1]),
                                          h_prev[1, :, :],
                                          w_ih_reverse, w_hh_reverse, b_ih_reverse, b_hh_reverse)[0]

    output[:, :, :hidden_size] = forward_output

    output[:, :, hidden_size:] = torch.flip(backward_output, [1])

    return output, torch.cat([forward_output[:, -1, :].unsqueeze(0), backward_output[:, -1, :].unsqueeze(0)], dim=0)

单向多层


def custom_numlayers_rnn_function(input,h_prev,num_layers,
                                  w_ih_0, w_hh_0,
                                  w_ih_i, w_hh_i,
                                  b_ih, b_hh):

    h_prev_temp = torch.tensor(h_prev)
    output, h_prev_temp[0, :, :] = custom_rnn_function(input,
                                                       h_prev[0, :, :],
                                                       w_ih_0, w_hh_0,
                                                       b_ih[0, :], b_hh[0, :])

    for i in range(1, num_layers):

        output, h_prev_temp[i, :, :] = custom_rnn_function(output,
                                                           h_prev[i, :, :],
                                                           w_ih_i[i - 1, :, :], w_hh_i[i - 1, :, :],
                                                           b_ih[i, :], b_hh[i, :],)

    return output, h_prev_temp

双向多层


def bi_custom_numlayers_rnn_function(input,
                                     h_prev,
                                     num_layers,
                                     w_ih_0, w_hh_0,
                                     w_ih_i, w_hh_i,
                                     b_ih, b_hh,
                                     w_ih_0_reverse, w_hh_0_reverse,
                                     w_ih_i_reverse, w_hh_i_reverse,
                                     b_ih_reverse, b_hh_reverse):

    h_prev_temp = torch.tensor(h_prev)
    output, h_prev_temp[0:2, :, :] = bi_custom_rnn_function(input,
                                                            h_prev[0:2, :, :],
                                                            w_ih_0, w_hh_0, b_ih[0, :], b_hh[0, :],
                                                            w_ih_0_reverse, w_hh_0_reverse, b_ih_reverse[0, :],
                                                            b_hh_reverse[0, :])

    for i in range(1, num_layers):

        output, h_prev_temp[i*2:i*2+2, :, :] = bi_custom_rnn_function(output,
                                                                    h_prev[i:i + 2, :, :],
                                                                    w_ih_i[i - 1, :, :], w_hh_i[i - 1, :, :],

                                                                    b_ih[i, :], b_hh[i, :],
                                                                    w_ih_i_reverse[i - 1, :, :],
                                                                    w_hh_i_reverse[i - 1, :, :],
                                                                    b_ih_reverse[i, :],
                                                                    b_hh_reverse[i, :])

    return output, h_prev_temp

自定义模型权重参数

上面使用的都是RNN API模型的权重参数,我想自己初始化,看看怎样能达到和Pytorch一样的结果

更新待定

Original: https://blog.csdn.net/Jruo911/article/details/124484921
Author: myaijarvis
Title: 【NLP】RNN理解(Pytorch实现)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/709035/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球