DNN预测气温(PyTorch,不同数据集对比)

目录

1.项目数据及源码

可在github下载:

https://github.com/chenshunpeng/Pytorch-framework-predicts-temperature

DNN预测气温(PyTorch,不同数据集对比)

; 2.数据集1(含friend数据项)

2.1.数据预处理

引入库文件:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import warnings

warnings.filterwarnings("ignore")
%matplotlib inline

import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

读入数据:

features = pd.read_csv('data1.csv')

features.head()

结果:

DNN预测气温(PyTorch,不同数据集对比)
处理时间数据:

import datetime

years = features['year']
months = features['month']
days = features['day']

dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]

dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

2.2.绘制数据图


plt.style.use('fivethirtyeight')

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2,
                                             ncols=2,
                                             figsize=(10, 10))
fig.autofmt_xdate(rotation=45)

ax1.plot(dates, features['actual'])
ax1.set_xlabel('')
ax1.set_ylabel('Temperature')
ax1.set_title('Max Temp')

ax2.plot(dates, features['temp_1'])
ax2.set_xlabel('')
ax2.set_ylabel('Temperature')
ax2.set_title('Previous Max Temp')

ax3.plot(dates, features['temp_2'])
ax3.set_xlabel('Date')
ax3.set_ylabel('Temperature')
ax3.set_title('Two Days Prior Max Temp')

ax4.plot(dates, features['friend'])
ax4.set_xlabel('Date')
ax4.set_ylabel('Temperature')
ax4.set_title('Friend Estimate')

plt.tight_layout(pad=2)

结果:

DNN预测气温(PyTorch,不同数据集对比)
字符串数据转为数值形式:

features = pd.get_dummies(features)
features.head(5)

结果:

DNN预测气温(PyTorch,不同数据集对比)
在特征中去掉标签 actual,把列表转化为数组:

labels = np.array(features['actual'])

features = features.drop('actual', axis=1)

feature_list = list(features.columns)

features = np.array(features)

2.3.网络模型结构图

构建2层神经网络 (只有1个隐含层),第1层有128个神经元,第2层有1个神经元,其结构可理解为:

[348,14]->[14,128]->[128,1](输入层->隐含层->输出层)

DNN预测气温(PyTorch,不同数据集对比)

; 2.4.构建网络模型

2.4.1.手动构建网络模型(CPU训练)

  • weights表示随机生成输入层至隐藏层权重,biases表示随机生成输入层至隐藏层偏移
  • 由于只有单隐藏层,故 weights2 表示隐藏层至输出层权重,biases2 表示隐藏层至输出层的偏移
  • 反向传播计算,反向传播是将当前求出的损失率进行由输出层至输入层的一个方向求偏导的过程,反向求导旨在优化参数w,使得误差更小
  • 在反向传播完成后,计算梯度,并进行w1,w2,b1,b2的更新,更新操作为,沿着梯度的反方向进行学习率*梯度的更新,在每次更新完毕后迭代清空
x = torch.tensor(input_features, dtype=float)

y = torch.tensor(labels, dtype=float)

weights = torch.randn((14, 128), dtype=float, requires_grad=True)
biases = torch.randn(128, dtype=float, requires_grad=True)

weights2 = torch.randn((128, 1), dtype=float, requires_grad=True)
biases2 = torch.randn(1, dtype=float, requires_grad=True)

learning_rate = 0.001
losses = []

for i in range(1000):

    hidden = x.mm(weights) + biases

    hidden = torch.relu(hidden)

    predictions = hidden.mm(weights2) + biases2

    loss = torch.mean((predictions - y)**2)
    losses.append(loss.data.numpy())

    if i % 100 == 0:
        print('loss:', loss)

    loss.backward()

    weights.data.add_(-learning_rate * weights.grad.data)
    biases.data.add_(-learning_rate * biases.grad.data)
    weights2.data.add_(-learning_rate * weights2.grad.data)
    biases2.data.add_(-learning_rate * biases2.grad.data)

    weights.grad.data.zero_()
    biases.grad.data.zero_()
    weights2.grad.data.zero_()
    biases2.grad.data.zero_()

结果:

DNN预测气温(PyTorch,不同数据集对比)

2.4.2.简洁构建网络模型(CPU训练)

初始化网络:

input_size = input_features.shape[1]
hidden_size = 128
output_size = 1
batch_size = 16
my_nn = torch.nn.Sequential(
    torch.nn.Linear(input_size, hidden_size),
    torch.nn.Sigmoid(),
    torch.nn.Linear(hidden_size, output_size),
)

cost = torch.nn.MSELoss(reduction='mean')

optimizer = torch.optim.Adam(my_nn.parameters(), lr=0.001)

训练网络:


losses = []
for i in range(1000):
    batch_loss = []

    for start in range(0, len(input_features), batch_size):
        end = start + batch_size if start + batch_size < len(
            input_features) else len(input_features)
        xx = torch.tensor(input_features[start:end],
                          dtype=torch.float,
                          requires_grad=True)
        yy = torch.tensor(labels[start:end],
                          dtype=torch.float,
                          requires_grad=True)
        prediction = my_nn(xx)
        loss = cost(prediction, yy)
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()
        batch_loss.append(loss.data.numpy())

    if i % 100 == 0:
        losses.append(np.mean(batch_loss))
        print(i, np.mean(batch_loss))

结果:

DNN预测气温(PyTorch,不同数据集对比)

预测训练结果

x = torch.tensor(input_features, dtype=torch.float)

predict = my_nn(x).data.numpy()

结果:

DNN预测气温(PyTorch,不同数据集对比)

在整个过程中,可以对训练过程进行优化的参数有:

  • 损失率
  • 激活函数(大多情况下用relu)
  • 损失的计算,可将方差改为其他损失类型

2.5.matplotlib结果可视化

数据设置


dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

true_data = pd.DataFrame(data={'date': dates, 'actual': labels})

months = features[:, feature_list.index('month')]
days = features[:, feature_list.index('day')]
years = features[:, feature_list.index('year')]

test_dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]

test_dates = [
    datetime.datetime.strptime(date, '%Y-%m-%d') for date in test_dates
]

predictions_data = pd.DataFrame(data={
    'date': test_dates,
    'prediction': predict.reshape(-1)
})

绘制图形:

plt.figure(figsize=(20,5))

plt.plot(true_data['date'], true_data['actual'], 'b-', label='actual')

plt.plot(predictions_data['date'],
         predictions_data['prediction'],
         'ro',
         label='prediction')
plt.xticks(rotation='60')
plt.legend()

plt.xlabel('Date')
plt.ylabel('Maximum Temperature (F)')
plt.title('Actual and Predicted Values')

结果:

DNN预测气温(PyTorch,不同数据集对比)

2.6.运行注意

开始运行的时候报错:

DNN预测气温(PyTorch,不同数据集对比)

后来发现是安装了2个numpy,这两个numpy版本冲突,首先2次(可能更多) pip uninstall numpy进行卸载

之后 pip3 install numpy -i http://pypi.douban.com/simple --trusted-host pypi.douban.com安装numpy即可

之后给出警告:

DNN预测气温(PyTorch,不同数据集对比)

发现:”numexpr”的版本需要更新,至少为2.7.0,输入命令 pip install --user numexpr==2.7.0 -i http://pypi.douban.com/simple --trusted-host pypi.douban.com更新即可

; 3.数据集2(不含friend数据项)

3.1.数据清洗


import pandas as pd

df = pd.read_csv("data1.csv")
df.head(2)

df=df.drop(['friend'],axis=1)

df.to_csv("data2.csv",index=False,encoding="utf-8")

读入数据:

df = pd.read_csv("data2.csv")
df.head(2)

结果:

DNN预测气温(PyTorch,不同数据集对比)

3.2.绘制数据图

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import warnings

warnings.filterwarnings("ignore")
%matplotlib inline

import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

features = pd.read_csv('data2.csv')

import datetime

years = features['year']
months = features['month']
days = features['day']

dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]

dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

plt.style.use('fivethirtyeight')

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2,
                                             ncols=2,
                                             figsize=(10, 10))
fig.autofmt_xdate(rotation=45)

ax1.plot(dates, features['actual'])
ax1.set_xlabel('')
ax1.set_ylabel('Temperature')
ax1.set_title('Max Temp')

ax2.plot(dates, features['temp_1'])
ax2.set_xlabel('')
ax2.set_ylabel('Temperature')
ax2.set_title('Previous Max Temp')

ax3.plot(dates, features['temp_2'])
ax3.set_xlabel('Date')
ax3.set_ylabel('Temperature')
ax3.set_title('Two Days Prior Max Temp')

plt.tight_layout(pad=2)

结果:

DNN预测气温(PyTorch,不同数据集对比)
数据处理

features = pd.get_dummies(features)

labels = np.array(features['actual'])

features = features.drop('actual', axis=1)

feature_list = list(features.columns)

features = np.array(features)

from sklearn import preprocessing

input_features = preprocessing.StandardScaler().fit_transform(features)

3.3.构建网络模型(GPU训练)

代码报错的解决方案:

训练神经网络时报错:can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

正确代码:

input_size = input_features.shape[1]
hidden_size = 128
output_size = 1
batch_size = 16
my_nn = torch.nn.Sequential(
    torch.nn.Linear(input_size, hidden_size),
    torch.nn.Sigmoid(),
    torch.nn.Linear(hidden_size, output_size),
)

cost = torch.nn.MSELoss(reduction='mean')

optimizer = torch.optim.Adam(my_nn.parameters(), lr=0.001)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

my_nn.to(device)

losses = []
for i in range(1000):
    batch_loss = []

    for start in range(0, len(input_features), batch_size):

        input_features.to(device)
        labels.to(device)

        end = start + batch_size if start + batch_size < len(
            input_features) else len(input_features)
        xx = torch.tensor(input_features[start:end],
                          dtype=torch.float,
                          requires_grad=True)
        yy = torch.tensor(labels[start:end],
                          dtype=torch.float,
                          requires_grad=True)

        prediction = my_nn(xx)
        loss = cost(prediction, yy)
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()

        batch_loss.append(loss.data.cpu().numpy())

    if i % 100 == 0:
        losses.append(np.mean(batch_loss))
        print(i, np.mean(batch_loss))

结果:

DNN预测气温(PyTorch,不同数据集对比)

3.4.matplotlib结果可视化

x = torch.tensor(input_features, dtype=torch.float)

predict = my_nn(x).data.numpy()

dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

true_data = pd.DataFrame(data={'date': dates, 'actual': labels})

months = features[:, feature_list.index('month')]
days = features[:, feature_list.index('day')]
years = features[:, feature_list.index('year')]

test_dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]

test_dates = [
    datetime.datetime.strptime(date, '%Y-%m-%d') for date in test_dates
]

predictions_data = pd.DataFrame(data={
    'date': test_dates,
    'prediction': predict.reshape(-1)
})

plt.figure(figsize=(20,5))

plt.plot(true_data['date'], true_data['actual'], 'b-', label='actual')

plt.plot(predictions_data['date'],
         predictions_data['prediction'],
         'ro',
         label='prediction')
plt.xticks(rotation='60')
plt.legend()

plt.xlabel('Date')
plt.ylabel('Maximum Temperature (F)')
plt.title('Actual and Predicted Values')

训练结果:

0 3870.6526
100 37.636475
200 35.619022
300 35.33533
400 35.18585
500 35.07154
600 34.970573
700 34.8705
800 34.767883
900 34.66196

可视化:

DNN预测气温(PyTorch,不同数据集对比)

4.结论

数据集的优化对结果影响不大,因此改进网络结构才是优化的关键所在

网络结构比较小的时候,CPU与GPU数据传输过程耗时更高,这个时候只用CPU会更快

网络结构比较庞大的时候,GPU的提速就比较明显了

Original: https://blog.csdn.net/qq_45550375/article/details/126110032
Author: csp_
Title: DNN预测气温(PyTorch,不同数据集对比)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/692239/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球