目录
- 1.项目数据及源码
- 2.数据集1(含friend数据项)
* - 2.1.数据预处理
- 2.2.绘制数据图
- 2.3.网络模型结构图
- 2.4.构建网络模型
– - 2.5.matplotlib结果可视化
- 2.6.运行注意
- 3.数据集2(不含friend数据项)
* - 3.1.数据清洗
- 3.2.绘制数据图
- 3.3.构建网络模型(GPU训练)
- 3.4.matplotlib结果可视化
- 4.结论
1.项目数据及源码
可在github下载:
https://github.com/chenshunpeng/Pytorch-framework-predicts-temperature
; 2.数据集1(含friend数据项)
2.1.数据预处理
引入库文件:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
读入数据:
features = pd.read_csv('data1.csv')
features.head()
结果:
处理时间数据:
import datetime
years = features['year']
months = features['month']
days = features['day']
dates = [
str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
for year, month, day in zip(years, months, days)
]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]
2.2.绘制数据图
plt.style.use('fivethirtyeight')
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2,
ncols=2,
figsize=(10, 10))
fig.autofmt_xdate(rotation=45)
ax1.plot(dates, features['actual'])
ax1.set_xlabel('')
ax1.set_ylabel('Temperature')
ax1.set_title('Max Temp')
ax2.plot(dates, features['temp_1'])
ax2.set_xlabel('')
ax2.set_ylabel('Temperature')
ax2.set_title('Previous Max Temp')
ax3.plot(dates, features['temp_2'])
ax3.set_xlabel('Date')
ax3.set_ylabel('Temperature')
ax3.set_title('Two Days Prior Max Temp')
ax4.plot(dates, features['friend'])
ax4.set_xlabel('Date')
ax4.set_ylabel('Temperature')
ax4.set_title('Friend Estimate')
plt.tight_layout(pad=2)
结果:
字符串数据转为数值形式:
features = pd.get_dummies(features)
features.head(5)
结果:
在特征中去掉标签
actual
,把列表转化为数组:
labels = np.array(features['actual'])
features = features.drop('actual', axis=1)
feature_list = list(features.columns)
features = np.array(features)
2.3.网络模型结构图
构建2层神经网络 (只有1个隐含层),第1层有128个神经元,第2层有1个神经元,其结构可理解为:
[348,14]->[14,128]->[128,1](输入层->隐含层->输出层)
; 2.4.构建网络模型
2.4.1.手动构建网络模型(CPU训练)
- weights表示随机生成输入层至隐藏层权重,biases表示随机生成输入层至隐藏层偏移
- 由于只有单隐藏层,故 weights2 表示隐藏层至输出层权重,biases2 表示隐藏层至输出层的偏移
- 反向传播计算,反向传播是将当前求出的损失率进行由输出层至输入层的一个方向求偏导的过程,反向求导旨在优化参数w,使得误差更小
- 在反向传播完成后,计算梯度,并进行w1,w2,b1,b2的更新,更新操作为,沿着梯度的反方向进行学习率*梯度的更新,在每次更新完毕后迭代清空
x = torch.tensor(input_features, dtype=float)
y = torch.tensor(labels, dtype=float)
weights = torch.randn((14, 128), dtype=float, requires_grad=True)
biases = torch.randn(128, dtype=float, requires_grad=True)
weights2 = torch.randn((128, 1), dtype=float, requires_grad=True)
biases2 = torch.randn(1, dtype=float, requires_grad=True)
learning_rate = 0.001
losses = []
for i in range(1000):
hidden = x.mm(weights) + biases
hidden = torch.relu(hidden)
predictions = hidden.mm(weights2) + biases2
loss = torch.mean((predictions - y)**2)
losses.append(loss.data.numpy())
if i % 100 == 0:
print('loss:', loss)
loss.backward()
weights.data.add_(-learning_rate * weights.grad.data)
biases.data.add_(-learning_rate * biases.grad.data)
weights2.data.add_(-learning_rate * weights2.grad.data)
biases2.data.add_(-learning_rate * biases2.grad.data)
weights.grad.data.zero_()
biases.grad.data.zero_()
weights2.grad.data.zero_()
biases2.grad.data.zero_()
结果:
2.4.2.简洁构建网络模型(CPU训练)
初始化网络:
input_size = input_features.shape[1]
hidden_size = 128
output_size = 1
batch_size = 16
my_nn = torch.nn.Sequential(
torch.nn.Linear(input_size, hidden_size),
torch.nn.Sigmoid(),
torch.nn.Linear(hidden_size, output_size),
)
cost = torch.nn.MSELoss(reduction='mean')
optimizer = torch.optim.Adam(my_nn.parameters(), lr=0.001)
训练网络:
losses = []
for i in range(1000):
batch_loss = []
for start in range(0, len(input_features), batch_size):
end = start + batch_size if start + batch_size < len(
input_features) else len(input_features)
xx = torch.tensor(input_features[start:end],
dtype=torch.float,
requires_grad=True)
yy = torch.tensor(labels[start:end],
dtype=torch.float,
requires_grad=True)
prediction = my_nn(xx)
loss = cost(prediction, yy)
optimizer.zero_grad()
loss.backward(retain_graph=True)
optimizer.step()
batch_loss.append(loss.data.numpy())
if i % 100 == 0:
losses.append(np.mean(batch_loss))
print(i, np.mean(batch_loss))
结果:
预测训练结果
x = torch.tensor(input_features, dtype=torch.float)
predict = my_nn(x).data.numpy()
结果:
在整个过程中,可以对训练过程进行优化的参数有:
- 损失率
- 激活函数(大多情况下用relu)
- 损失的计算,可将方差改为其他损失类型
2.5.matplotlib结果可视化
数据设置
dates = [
str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
for year, month, day in zip(years, months, days)
]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]
true_data = pd.DataFrame(data={'date': dates, 'actual': labels})
months = features[:, feature_list.index('month')]
days = features[:, feature_list.index('day')]
years = features[:, feature_list.index('year')]
test_dates = [
str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
for year, month, day in zip(years, months, days)
]
test_dates = [
datetime.datetime.strptime(date, '%Y-%m-%d') for date in test_dates
]
predictions_data = pd.DataFrame(data={
'date': test_dates,
'prediction': predict.reshape(-1)
})
绘制图形:
plt.figure(figsize=(20,5))
plt.plot(true_data['date'], true_data['actual'], 'b-', label='actual')
plt.plot(predictions_data['date'],
predictions_data['prediction'],
'ro',
label='prediction')
plt.xticks(rotation='60')
plt.legend()
plt.xlabel('Date')
plt.ylabel('Maximum Temperature (F)')
plt.title('Actual and Predicted Values')
结果:
2.6.运行注意
开始运行的时候报错:
后来发现是安装了2个numpy,这两个numpy版本冲突,首先2次(可能更多) pip uninstall numpy
进行卸载
之后 pip3 install numpy -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
安装numpy即可
之后给出警告:
发现:”numexpr”的版本需要更新,至少为2.7.0,输入命令 pip install --user numexpr==2.7.0 -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
更新即可
; 3.数据集2(不含friend数据项)
3.1.数据清洗
import pandas as pd
df = pd.read_csv("data1.csv")
df.head(2)
df=df.drop(['friend'],axis=1)
df.to_csv("data2.csv",index=False,encoding="utf-8")
读入数据:
df = pd.read_csv("data2.csv")
df.head(2)
结果:
3.2.绘制数据图
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
features = pd.read_csv('data2.csv')
import datetime
years = features['year']
months = features['month']
days = features['day']
dates = [
str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
for year, month, day in zip(years, months, days)
]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]
plt.style.use('fivethirtyeight')
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2,
ncols=2,
figsize=(10, 10))
fig.autofmt_xdate(rotation=45)
ax1.plot(dates, features['actual'])
ax1.set_xlabel('')
ax1.set_ylabel('Temperature')
ax1.set_title('Max Temp')
ax2.plot(dates, features['temp_1'])
ax2.set_xlabel('')
ax2.set_ylabel('Temperature')
ax2.set_title('Previous Max Temp')
ax3.plot(dates, features['temp_2'])
ax3.set_xlabel('Date')
ax3.set_ylabel('Temperature')
ax3.set_title('Two Days Prior Max Temp')
plt.tight_layout(pad=2)
结果:
数据处理
features = pd.get_dummies(features)
labels = np.array(features['actual'])
features = features.drop('actual', axis=1)
feature_list = list(features.columns)
features = np.array(features)
from sklearn import preprocessing
input_features = preprocessing.StandardScaler().fit_transform(features)
3.3.构建网络模型(GPU训练)
代码报错的解决方案:
正确代码:
input_size = input_features.shape[1]
hidden_size = 128
output_size = 1
batch_size = 16
my_nn = torch.nn.Sequential(
torch.nn.Linear(input_size, hidden_size),
torch.nn.Sigmoid(),
torch.nn.Linear(hidden_size, output_size),
)
cost = torch.nn.MSELoss(reduction='mean')
optimizer = torch.optim.Adam(my_nn.parameters(), lr=0.001)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
my_nn.to(device)
losses = []
for i in range(1000):
batch_loss = []
for start in range(0, len(input_features), batch_size):
input_features.to(device)
labels.to(device)
end = start + batch_size if start + batch_size < len(
input_features) else len(input_features)
xx = torch.tensor(input_features[start:end],
dtype=torch.float,
requires_grad=True)
yy = torch.tensor(labels[start:end],
dtype=torch.float,
requires_grad=True)
prediction = my_nn(xx)
loss = cost(prediction, yy)
optimizer.zero_grad()
loss.backward(retain_graph=True)
optimizer.step()
batch_loss.append(loss.data.cpu().numpy())
if i % 100 == 0:
losses.append(np.mean(batch_loss))
print(i, np.mean(batch_loss))
结果:
3.4.matplotlib结果可视化
x = torch.tensor(input_features, dtype=torch.float)
predict = my_nn(x).data.numpy()
dates = [
str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
for year, month, day in zip(years, months, days)
]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]
true_data = pd.DataFrame(data={'date': dates, 'actual': labels})
months = features[:, feature_list.index('month')]
days = features[:, feature_list.index('day')]
years = features[:, feature_list.index('year')]
test_dates = [
str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
for year, month, day in zip(years, months, days)
]
test_dates = [
datetime.datetime.strptime(date, '%Y-%m-%d') for date in test_dates
]
predictions_data = pd.DataFrame(data={
'date': test_dates,
'prediction': predict.reshape(-1)
})
plt.figure(figsize=(20,5))
plt.plot(true_data['date'], true_data['actual'], 'b-', label='actual')
plt.plot(predictions_data['date'],
predictions_data['prediction'],
'ro',
label='prediction')
plt.xticks(rotation='60')
plt.legend()
plt.xlabel('Date')
plt.ylabel('Maximum Temperature (F)')
plt.title('Actual and Predicted Values')
训练结果:
0 3870.6526
100 37.636475
200 35.619022
300 35.33533
400 35.18585
500 35.07154
600 34.970573
700 34.8705
800 34.767883
900 34.66196
可视化:
4.结论
数据集的优化对结果影响不大,因此改进网络结构才是优化的关键所在
网络结构比较小的时候,CPU与GPU数据传输过程耗时更高,这个时候只用CPU会更快
网络结构比较庞大的时候,GPU的提速就比较明显了
Original: https://blog.csdn.net/qq_45550375/article/details/126110032
Author: csp_
Title: DNN预测气温(PyTorch,不同数据集对比)
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/692239/
转载文章受原作者版权保护。转载请注明原作者出处!