基于GAN的时序缺失数据填补前言（1）——RNN介绍及pytorch代码实现

2023年7月22日下午1:05 • 人工智能 • 阅读 76

本专栏将主要介绍基于GAN的时序缺失数据填补。提起时序数据，就离不开一个神经网络—— 循环神经网络（Recurrent Neural Network, RNN）。RNN是一类用于处理序列数据的神经网络。RNN对具有序列特性的数据非常有效，它能挖掘数据中的时序信息。因为在介绍时序缺失数据填补，就离不开RNN的身影。本文将介绍循环神经网络RNN，并再次基础上完成基于pytorch的简单RNN代码实现，帮助更加深入了解RNN。

一、RNN介绍

关于循环神经网络RNN的介绍可以参考这篇文章：循环神经网络RNN入门介绍，这里不进行过多赘述。

二、PyTorch相关语法介绍

2.1 RNNcell

RNNcell结构如下：

h0为先验，若不存在先验，则h0是维度与h1，h2相同的全零向量。
语法如下：

torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh', device=None, dtype=None)

具有tanh或ReLU非线性的RNN单元。

input_size：输入维度；
hidden_size：隐藏维度；
bias：如果为False，则该层不使用偏置权重b_ih和b_hh，默认为True；
nonlinearity：使用的非线性层，可以是’tanh’或’relu’，默认为’tanh’;

即h 1 = R N N C e l l ( x 1 , h 0 ) h_1=RNNCell(x_1,h_0)h 1 =R N N C e l l (x 1 ,h 0 )
输入维度：(batch_size,input_size)
隐含层维度：(batch_size,hidden_size)
输出维度：(batch_size,hidden_size)

因此数据集尺寸：(seqLen,batch_size,input_size)

代码示例：

import torch
import torch.nn as nn

batch_size=1
seq_len=3
input_size=4
hidden_size=2

cell=nn.RNNCell(input_size=input_size,hidden_size=hidden_size)

dataset=torch.randn(seq_len,batch_size,input_size)
hidden=torch.zeros(batch_size,hidden_size)

for idx,input in enumerate(dataset):
    print('='*20,idx,'='*20)
    print('Input size:',input.shape)
    hidden=cell(input,hidden)
    print('output size:',hidden.shape)
    print(hidden)

运行结果：

2.2 RNN

RNN用于定义多层循环神经网络。

上图中num_layers=3.

语法如下：

torch.nn.RNN(input_size, hidden_size, num_layers, bias=True, nonlinearity='tanh', batch_first, dropout, bidirectional)

batch_first：如果为True，则输入和输出张量将作为(batch,seq,feature)而不是(seq,batch,feature)，默认为False；

out,hidden=rnn(x,h0)
x=[x1,x2,…,xn]
out=[h1,h2,…,hn]
hidden=hn（最后一个h）

输入维度：(seq_len, batch_size,input_size)
隐含层维度：(num_layers, batch_size,hidden_size)
输出维度：(seq_len, batch_size,hidden_size)
隐含层维度：(num_layers, batch_size,hidden_size)

代码示例：

import torch
import torch.nn as nn

batch_size=1
seq_len=3
input_size=4
hidden_size=2
num_layers=1

rnn=nn.RNN(input_size=input_size,hidden_size=hidden_size,num_layers=num_layers)

input=torch.randn(seq_len,batch_size,input_size)
hidden=torch.zeros(num_layers,batch_size,hidden_size)

out,hidden=rnn(input,hidden)

print('Output size:',out.shape)
print(out)
print('Hidden size:',hidden.shape)
print(hidden)

运行结果：

三、代码实战（1）

3.1 功能描述

本代码实现将文本序列”hello”转为序列”ohlol”。

3.2 利用RNNcell实现

首先需要将文本转为独热编码：

idx2char=['e','h','l','o']
x_data=[1,0,2,2,3]
y_data=[3,1,2,3,2]

one_hot_lookup=[[1,0,0,0],
               [0,1,0,0],
               [0,0,1,0],
               [0,0,0,1]]
x_one_hot=[one_hot_lookup[x] for x in x_data]

相关维度：
input_size=4
hidden_size=4
输入维度：(batch_size,input_size)
隐含层维度：(batch_size,hidden_size)
输出维度：(batch_size,hidden_size)

因此序列尺寸：(seqLen,batch_size,input_size)

完整代码如下：

import torch
import torch.nn as nn

input_size=4
hidden_size=4
batch_size=1

idx2char=['e','h','l','o']
x_data=[1,0,2,2,3]
y_data=[3,1,2,3,2]

one_hot_lookup=[[1,0,0,0],
               [0,1,0,0],
               [0,0,1,0],
               [0,0,0,1]]
x_one_hot=[one_hot_lookup[x] for x in x_data]

inputs=torch.Tensor(x_one_hot).view(-1,batch_size,input_size)
labels=torch.LongTensor(y_data).view(-1,1)

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, batch_size):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.rnncell = nn.RNNCell(input_size=input_size, hidden_size=hidden_size)

    def forward(self, input, hidden):
        hidden = self.rnncell(input, hidden)
        return hidden

    def init_hidden(self):
        return torch.zeros(self.batch_size, self.hidden_size)

rnn = RNN(input_size, hidden_size, batch_size)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(),lr=0.1)

for epoch in range(20):
    loss = 0
    optimizer.zero_grad()

    hidden = rnn.init_hidden()
    print('Predicted string: ', end='')
    for input, label in zip(inputs, labels):
        hidden = rnn(input, hidden)
        loss += criterion(hidden, label)
        _, idx = hidden.max(dim=1)
        print(idx2char[idx.item()], end='')
    loss.backward()
    optimizer.step()
    print(',Epoch [%d/15] loss=%.4f' % (epoch + 1, loss.item()))

运行结果：

3.3 利用RNN实现

代码示例：

import torch
import torch.nn as nn

input_size=4
hidden_size=4
batch_size=1
num_layers=1
seq_len=5

idx2char=['e','h','l','o']
x_data=[1,0,2,2,3]
y_data=[3,1,2,3,2]

one_hot_lookup=[[1,0,0,0],
               [0,1,0,0],
               [0,0,1,0],
               [0,0,0,1]]
x_one_hot=[one_hot_lookup[x] for x in x_data]

inputs=torch.Tensor(x_one_hot).view(seq_len,batch_size,input_size)
labels=torch.LongTensor(y_data)

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, batch_size,num_layers):
        super(RNN, self).__init__()
        self.num_layers=num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.batch_size = batch_size
        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size,num_layers=num_layers)

    def forward(self, input,):
        hidden=torch.zeros(self.num_layers,self.batch_size, self.hidden_size)
        out,_ = self.rnn(input, hidden)
        return out.view(-1,self.hidden_size)

rnn = RNN(input_size, hidden_size, batch_size,num_layers)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(),lr=0.1)

for epoch in range(20):
    optimizer.zero_grad()
    outputs = rnn(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    _, idx = outputs.max(dim=1)
    idx=idx.data.numpy()
    print('Predicted: ',''.join([idx2char[x] for x in idx]), end='')
    print(',Epoch [%d/15] loss=%.4f' % (epoch + 1, loss.item()))

运行结果：

四、代码实战（2）

4.1 功能描述

本案例将实现利用输入正弦sin输出余弦cos曲线。

4.2 利用RNN实现

首先进行参数定义：

input_size=1
hidden_size=16
batch_size=1
num_layers=1
seq_len=50

定义RNN网络框架：

class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size,num_layers=num_layers)
        self.linear=nn.Linear(hidden_size,1)

    def forward(self,x,hidden):
        out,_= self.rnn(x, hidden)
        out=out.view(-1, hidden_size)
        out=self.linear(out)
        out=out.unsqueeze(dim=1)
        return out

模型实例化：

rnn = RNN()
print(rnn)

loss_fun = nn.MSELoss()
optimizer = torch.optim.Adam(rnn.parameters(),lr=0.01)

模型训练：

hidden_prev = torch.zeros(num_layers,batch_size,hidden_size)

plt.figure(figsize=(20, 4))
plt.ion()
for it in range(1000+1):
    steps = np.linspace(it * np.pi, (it + 1) * np.pi, seq_len, dtype=np.float32)
    x_np = np.sin(steps)
    y_np = np.cos(steps)
    x = torch.from_numpy(x_np[:, np.newaxis, np.newaxis])
    y = torch.from_numpy(y_np[:, np.newaxis, np.newaxis])
    y_pred = rnn(x, hidden_prev)
    loss = loss_fun(y_pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    plt.plot(steps, y_np, 'b-')
    plt.plot(steps, y_pred.data.numpy().flatten(), 'r-')
    plt.draw()
    plt.pause(0.05)

    if it % 100 ==0:
        print("Iter:{} loss:{}".format(it,loss))
plt.ioff()
plt.show()

最后进行测试：


steps = np.linspace(10 * np.pi, (10 + 1) * np.pi, seq_len, dtype=np.float32)
x_np = np.sin(steps)
y_np = np.cos(steps)

plt.scatter(steps, y_np, c='b')
x = torch.from_numpy(x_np[:, np.newaxis, np.newaxis])
y_pred= rnn(x,hidden_prev)
plt.scatter(steps, y_pred.data.numpy().flatten(), c='r')
plt.show()

4.3 完整代码

完整代码如下：

import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

input_size=1
hidden_size=16
batch_size=1
num_layers=1
seq_len=50

class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size,num_layers=num_layers)
        self.linear=nn.Linear(hidden_size,1)

    def forward(self,x,hidden):
        out,_= self.rnn(x, hidden)
        out=out.view(-1, hidden_size)
        out=self.linear(out)
        out=out.unsqueeze(dim=1)
        return out

rnn = RNN()
print(rnn)

loss_fun = nn.MSELoss()
optimizer = torch.optim.Adam(rnn.parameters(),lr=0.01)

hidden_prev = torch.zeros(num_layers,batch_size,hidden_size)

plt.figure(figsize=(12, 3))
plt.ion()
for it in range(1000+1):
    steps = np.linspace(it * np.pi, (it + 1) * np.pi, seq_len, dtype=np.float32)
    x_np = np.sin(steps)
    y_np = np.cos(steps)
    x = torch.from_numpy(x_np[:, np.newaxis, np.newaxis])
    y = torch.from_numpy(y_np[:, np.newaxis, np.newaxis])
    y_pred = rnn(x, hidden_prev)
    loss = loss_fun(y_pred, y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    plt.plot(steps, y_np, 'b-')
    plt.plot(steps, y_pred.data.numpy().flatten(), 'r-')
    plt.draw()
    plt.pause(0.05)

    if it % 100 ==0:
        print("Iter:{} loss:{}".format(it,loss))
plt.ioff()
plt.show()

steps = np.linspace(10 * np.pi, (10 + 1) * np.pi, seq_len, dtype=np.float32)
x_np = np.sin(steps)
y_np = np.cos(steps)

plt.scatter(steps, y_np, c='b')
x = torch.from_numpy(x_np[:, np.newaxis, np.newaxis])
y_pred= rnn(x,hidden_prev)
plt.scatter(steps, y_pred.data.numpy().flatten(), c='r')
plt.show()

4.4 运行结果

运行结果如下：

loss损失如下：

测试结果如下：

【解释】：这里前4-5个点存在较大的偏差，原因可能是前面点没有先验h，因此存在偏差较大，为获得测试较小偏差，可以对测试代码进行适度修改：


steps = np.linspace(10 * np.pi, (10 + 1) * np.pi, seq_len, dtype=np.float32)
x_np = np.sin(steps)
y_np = np.cos(steps)

plt.scatter(steps[5:,], y_np[5:,], c='b')
x = torch.from_numpy(x_np[:, np.newaxis, np.newaxis])
y_pred= rnn(x,hidden_prev)
plt.scatter(steps[5:,], y_pred.data.numpy().flatten()[5:,], c='r')
plt.show()

运行效果如下：

参考：

Original: https://blog.csdn.net/didi_ya/article/details/125635413
Author: wendy_ya
Title: 基于GAN的时序缺失数据填补前言（1）——RNN介绍及pytorch代码实现

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/709059/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

【自然语言处理】【多模态】Zero&R2D2：大规模中文跨模态基准和视觉语言框架

Zero&R2D2：大规模中文跨模态基准和视觉语言框架《Zero and R2D2：A Large-scale Chinese Cross-modal Benchmark…

人工智能 2023年5月31日
0080
GEE：利用libsvm()创建一个支持向量机SVM分类器进行土地利用分类

结果展示： ; 函数：ee.Classifier.libsvm() 用法返回ee.Classifier.libsvm(decisionProcedure, svmType, ker…

人工智能 2023年6月30日
0065
DBSCAN聚类——Python实现

一、DBSCAN(Density-Baseed Spatial Clustering of Applications with Noise)聚类算法核心对象：若某个点的密度达到算…

人工智能 2023年6月15日
0083
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based VQA 论文阅读 From CVPR 2022

MuKEA：基于视觉问答（VQA）的多模态知识抽取与积累论文下载：https://arxiv.org/abs/2203.09138github代码：https://github….

人工智能 2023年6月1日
0090
数据分析——从入门到精通(三)

Python Data Analysis Library或pandas是基于NumPy的一种工具，该工具是为了解决数据分析任务而创建的 Pandas 纳入了大量库和一些标准的数据模…

人工智能 2023年7月7日
0055
opencv案例: 车辆检测

实现代码 import cv2 import numpy as np cap = cv2.VideoCapture(r’C:\Users\25584\Desktop\video.m…

人工智能 2023年7月18日
0062
数据挖掘学习——支持向量机（SVM）

目录 1.概论（1）线性可分支持向量机 1.原始问题： 2.SVM 3.分类预测可靠度 4.分类间隔 5.约束条件 6.线性可分支持向量机的学习算法（最大间隔法） 7.对偶算法 …

人工智能 2023年7月17日
0077
【损失函数：1】L1、L2、SmoothL1（附Pytorch实现）

损失函数前言一、基础损失 * 1.L1损失（MAE：平均绝对误差） 2.L2损失（MSE：均方差） 3.L1、L2对比三、扩展 * 1.Smooth L1损失（平滑L1损失）…

人工智能 2023年7月24日
0040
【NSDictionary的概述 Objective-C语言】

抵扣说明： 1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。 Original: https:…

人工智能 2023年6月28日
0067
【全流程】从头在树莓派4B上部署自己训练的yolov5模型（配合NCS2加速）

目录 0.前言 1.我的环境 2.整个流程 3.具体过程 * 3.1 训练自己的yolov5模型 3.2 将.pt模型转换为.onnx模型 3.3 在本地将.onnx转换成IR模型…

人工智能 2023年6月16日
0079
PyTorch使用ResNet18提取图像特征并进行相似度计算

模型部分我参考的是https://blog.csdn.net/sunqiande88/article/details/80100891这篇文章，同样是在Cifar-10上训练。一…

人工智能 2023年7月18日
0052
YOLOv5基础知识点——目标检测基本思想

You Only Look Once YOLO 将特征图划分为S×S的格子（grid cells），每个格子负责对落入其中的目标进行检测，一次性预测所有各自所含目标的边界框、定位置…

人工智能 2023年7月11日
0044
统计学习-01统计学习概念

chapter 2 统计学习 2.1基本概念统计学习是关于估计 f ( ⋅ ) f(\cdot)f (⋅) 的一系列方法，其中f ( ⋅ ) f(\cdot)f (⋅)为一个定量…

人工智能 2023年6月17日
0066
Anaconda + tensorflow-gpu + docker + pycharm专业版实现本地远程互联

系统 linux ubuntu 关于docker：docker详解_cyy18702236763的博客-CSDN博客关于远程连接：docker + pycharm专业版 + py…

人工智能 2023年5月25日
0089
5.OpenCV图像拼接

一、前言图像拼接（Image Stitching）是一种利用实景图像组成全景空间的技术，它将多幅图像拼接成一幅大尺度图像或360°全景图，可视作场景重建的一种特殊情况，其中图像仅…

人工智能 2023年6月24日
0071
利用Python中的Matplotlib，绘制2000年以来中国人口出生率、死亡率及增长率趋势图

利用Python中的Matplotlib，绘制2000年以来中国人口出生率、死亡率及增长率趋势图文章目录利用Python中的Matplotlib，绘制2000年以来中国人口出生…

人工智能 2023年7月15日
0052

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

基于GAN的时序缺失数据填补前言（1）——RNN介绍及pytorch代码实现

目录

一、RNN介绍

二、PyTorch相关语法介绍

2.1 RNNcell

2.2 RNN

三、代码实战（1）

3.1 功能描述

3.2 利用RNNcell实现

3.3 利用RNN实现

四、代码实战（2）

4.1 功能描述

4.2 利用RNN实现

4.3 完整代码

4.4 运行结果

大家都在看