DNN预测气温（PyTorch，不同数据集对比）

2023年7月14日下午2:28 • 人工智能 • 阅读 82

1.项目数据及源码
2.数据集1（含friend数据项）
*
2.1.数据预处理
2.2.绘制数据图
2.3.网络模型结构图
2.4.构建网络模型
–
- 2.4.1.手动构建网络模型（CPU训练）
- 2.4.2.简洁构建网络模型（CPU训练）
2.5.matplotlib结果可视化
2.6.运行注意
3.数据集2（不含friend数据项）
*
3.1.数据清洗
3.2.绘制数据图
3.3.构建网络模型（GPU训练）
3.4.matplotlib结果可视化
4.结论

1.项目数据及源码

可在github下载：

https://github.com/chenshunpeng/Pytorch-framework-predicts-temperature

; 2.数据集1（含friend数据项）

2.1.数据预处理

引入库文件：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import warnings

warnings.filterwarnings("ignore")
%matplotlib inline

import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

读入数据：

features = pd.read_csv('data1.csv')

features.head()

结果：

处理时间数据：


import datetime

years = features['year']
months = features['month']
days = features['day']

dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]

dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

2.2.绘制数据图


plt.style.use('fivethirtyeight')

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2,
                                             ncols=2,
                                             figsize=(10, 10))
fig.autofmt_xdate(rotation=45)

ax1.plot(dates, features['actual'])
ax1.set_xlabel('')
ax1.set_ylabel('Temperature')
ax1.set_title('Max Temp')

ax2.plot(dates, features['temp_1'])
ax2.set_xlabel('')
ax2.set_ylabel('Temperature')
ax2.set_title('Previous Max Temp')

ax3.plot(dates, features['temp_2'])
ax3.set_xlabel('Date')
ax3.set_ylabel('Temperature')
ax3.set_title('Two Days Prior Max Temp')

ax4.plot(dates, features['friend'])
ax4.set_xlabel('Date')
ax4.set_ylabel('Temperature')
ax4.set_title('Friend Estimate')

plt.tight_layout(pad=2)

结果：

字符串数据转为数值形式：


features = pd.get_dummies(features)
features.head(5)

结果：

在特征中去掉标签 actual，把列表转化为数组：


labels = np.array(features['actual'])

features = features.drop('actual', axis=1)

feature_list = list(features.columns)

features = np.array(features)

2.3.网络模型结构图

构建2层神经网络（只有1个隐含层），第1层有128个神经元，第2层有1个神经元，其结构可理解为：

[348,14]->[14,128]->[128,1]（输入层->隐含层->输出层）

; 2.4.构建网络模型

2.4.1.手动构建网络模型（CPU训练）

weights表示随机生成输入层至隐藏层权重，biases表示随机生成输入层至隐藏层偏移
由于只有单隐藏层，故 weights2 表示隐藏层至输出层权重，biases2 表示隐藏层至输出层的偏移
反向传播计算，反向传播是将当前求出的损失率进行由输出层至输入层的一个方向求偏导的过程，反向求导旨在优化参数w，使得误差更小
在反向传播完成后，计算梯度，并进行w1，w2，b1，b2的更新，更新操作为，沿着梯度的反方向进行学习率*梯度的更新，在每次更新完毕后迭代清空

x = torch.tensor(input_features, dtype=float)

y = torch.tensor(labels, dtype=float)

weights = torch.randn((14, 128), dtype=float, requires_grad=True)
biases = torch.randn(128, dtype=float, requires_grad=True)

weights2 = torch.randn((128, 1), dtype=float, requires_grad=True)
biases2 = torch.randn(1, dtype=float, requires_grad=True)

learning_rate = 0.001
losses = []

for i in range(1000):

    hidden = x.mm(weights) + biases

    hidden = torch.relu(hidden)

    predictions = hidden.mm(weights2) + biases2

    loss = torch.mean((predictions - y)**2)
    losses.append(loss.data.numpy())

    if i % 100 == 0:
        print('loss:', loss)

    loss.backward()

    weights.data.add_(-learning_rate * weights.grad.data)
    biases.data.add_(-learning_rate * biases.grad.data)
    weights2.data.add_(-learning_rate * weights2.grad.data)
    biases2.data.add_(-learning_rate * biases2.grad.data)

    weights.grad.data.zero_()
    biases.grad.data.zero_()
    weights2.grad.data.zero_()
    biases2.grad.data.zero_()

结果：

2.4.2.简洁构建网络模型（CPU训练）

初始化网络：

input_size = input_features.shape[1]
hidden_size = 128
output_size = 1
batch_size = 16
my_nn = torch.nn.Sequential(
    torch.nn.Linear(input_size, hidden_size),
    torch.nn.Sigmoid(),
    torch.nn.Linear(hidden_size, output_size),
)

cost = torch.nn.MSELoss(reduction='mean')

optimizer = torch.optim.Adam(my_nn.parameters(), lr=0.001)

训练网络：


losses = []
for i in range(1000):
    batch_loss = []

    for start in range(0, len(input_features), batch_size):
        end = start + batch_size if start + batch_size < len(
            input_features) else len(input_features)
        xx = torch.tensor(input_features[start:end],
                          dtype=torch.float,
                          requires_grad=True)
        yy = torch.tensor(labels[start:end],
                          dtype=torch.float,
                          requires_grad=True)
        prediction = my_nn(xx)
        loss = cost(prediction, yy)
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()
        batch_loss.append(loss.data.numpy())

    if i % 100 == 0:
        losses.append(np.mean(batch_loss))
        print(i, np.mean(batch_loss))

结果：

预测训练结果

x = torch.tensor(input_features, dtype=torch.float)

predict = my_nn(x).data.numpy()

结果：

在整个过程中，可以对训练过程进行优化的参数有：

损失率
激活函数（大多情况下用relu）
损失的计算，可将方差改为其他损失类型

2.5.matplotlib结果可视化

数据设置


dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

true_data = pd.DataFrame(data={'date': dates, 'actual': labels})

months = features[:, feature_list.index('month')]
days = features[:, feature_list.index('day')]
years = features[:, feature_list.index('year')]

test_dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]

test_dates = [
    datetime.datetime.strptime(date, '%Y-%m-%d') for date in test_dates
]

predictions_data = pd.DataFrame(data={
    'date': test_dates,
    'prediction': predict.reshape(-1)
})

绘制图形：

plt.figure(figsize=(20,5))

plt.plot(true_data['date'], true_data['actual'], 'b-', label='actual')

plt.plot(predictions_data['date'],
         predictions_data['prediction'],
         'ro',
         label='prediction')
plt.xticks(rotation='60')
plt.legend()

plt.xlabel('Date')
plt.ylabel('Maximum Temperature (F)')
plt.title('Actual and Predicted Values')

结果：

2.6.运行注意

开始运行的时候报错：

后来发现是安装了2个numpy，这两个numpy版本冲突，首先2次（可能更多） pip uninstall numpy进行卸载

之后 pip3 install numpy -i http://pypi.douban.com/simple --trusted-host pypi.douban.com安装numpy即可

之后给出警告：

发现：”numexpr”的版本需要更新，至少为2.7.0，输入命令 pip install --user numexpr==2.7.0 -i http://pypi.douban.com/simple --trusted-host pypi.douban.com更新即可

; 3.数据集2（不含friend数据项）

3.1.数据清洗


import pandas as pd

df = pd.read_csv("data1.csv")
df.head(2)

df=df.drop(['friend'],axis=1)

df.to_csv("data2.csv",index=False,encoding="utf-8")

读入数据：

df = pd.read_csv("data2.csv")
df.head(2)

结果：

3.2.绘制数据图

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.optim as optim
import warnings

warnings.filterwarnings("ignore")
%matplotlib inline

import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

features = pd.read_csv('data2.csv')

import datetime

years = features['year']
months = features['month']
days = features['day']

dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]

dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

plt.style.use('fivethirtyeight')

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2,
                                             ncols=2,
                                             figsize=(10, 10))
fig.autofmt_xdate(rotation=45)

ax1.plot(dates, features['actual'])
ax1.set_xlabel('')
ax1.set_ylabel('Temperature')
ax1.set_title('Max Temp')

ax2.plot(dates, features['temp_1'])
ax2.set_xlabel('')
ax2.set_ylabel('Temperature')
ax2.set_title('Previous Max Temp')

ax3.plot(dates, features['temp_2'])
ax3.set_xlabel('Date')
ax3.set_ylabel('Temperature')
ax3.set_title('Two Days Prior Max Temp')

plt.tight_layout(pad=2)

结果：

数据处理


features = pd.get_dummies(features)

labels = np.array(features['actual'])

features = features.drop('actual', axis=1)

feature_list = list(features.columns)

features = np.array(features)

from sklearn import preprocessing

input_features = preprocessing.StandardScaler().fit_transform(features)

3.3.构建网络模型（GPU训练）

代码报错的解决方案：

训练神经网络时报错：can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

正确代码：

input_size = input_features.shape[1]
hidden_size = 128
output_size = 1
batch_size = 16
my_nn = torch.nn.Sequential(
    torch.nn.Linear(input_size, hidden_size),
    torch.nn.Sigmoid(),
    torch.nn.Linear(hidden_size, output_size),
)

cost = torch.nn.MSELoss(reduction='mean')

optimizer = torch.optim.Adam(my_nn.parameters(), lr=0.001)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

my_nn.to(device)

losses = []
for i in range(1000):
    batch_loss = []

    for start in range(0, len(input_features), batch_size):

        input_features.to(device)
        labels.to(device)

        end = start + batch_size if start + batch_size < len(
            input_features) else len(input_features)
        xx = torch.tensor(input_features[start:end],
                          dtype=torch.float,
                          requires_grad=True)
        yy = torch.tensor(labels[start:end],
                          dtype=torch.float,
                          requires_grad=True)

        prediction = my_nn(xx)
        loss = cost(prediction, yy)
        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()

        batch_loss.append(loss.data.cpu().numpy())

    if i % 100 == 0:
        losses.append(np.mean(batch_loss))
        print(i, np.mean(batch_loss))

结果：

3.4.matplotlib结果可视化

x = torch.tensor(input_features, dtype=torch.float)

predict = my_nn(x).data.numpy()

dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]
dates = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in dates]

true_data = pd.DataFrame(data={'date': dates, 'actual': labels})

months = features[:, feature_list.index('month')]
days = features[:, feature_list.index('day')]
years = features[:, feature_list.index('year')]

test_dates = [
    str(int(year)) + '-' + str(int(month)) + '-' + str(int(day))
    for year, month, day in zip(years, months, days)
]

test_dates = [
    datetime.datetime.strptime(date, '%Y-%m-%d') for date in test_dates
]

predictions_data = pd.DataFrame(data={
    'date': test_dates,
    'prediction': predict.reshape(-1)
})

plt.figure(figsize=(20,5))

plt.plot(true_data['date'], true_data['actual'], 'b-', label='actual')

plt.plot(predictions_data['date'],
         predictions_data['prediction'],
         'ro',
         label='prediction')
plt.xticks(rotation='60')
plt.legend()

plt.xlabel('Date')
plt.ylabel('Maximum Temperature (F)')
plt.title('Actual and Predicted Values')

训练结果：

0 3870.6526
100 37.636475
200 35.619022
300 35.33533
400 35.18585
500 35.07154
600 34.970573
700 34.8705
800 34.767883
900 34.66196

可视化：

4.结论

数据集的优化对结果影响不大，因此改进网络结构才是优化的关键所在

网络结构比较小的时候，CPU与GPU数据传输过程耗时更高，这个时候只用CPU会更快

网络结构比较庞大的时候，GPU的提速就比较明显了

Original: https://blog.csdn.net/qq_45550375/article/details/126110032
Author: csp_
Title: DNN预测气温（PyTorch，不同数据集对比）

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/692239/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

未来几年人工智能会迎来哪些激动人心的改变

未来几年人工智能会迎来哪些激动人心的改变？多样性可能是目标之一。早在几年之前，关于AI多样性这个概念有过一些解释——报道说韩国人习惯于睡地铺，但扫地机器人按照之前设定好的程序，不小…

人工智能 2023年6月1日
0081
ROS机器视觉自学笔记（1）

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、ros中安装Opencv 二、使用HSV通道实现目标检测 * 1.HSV颜色通道简介 2. 安…

人工智能 2023年7月19日
0055
各种HDR标准的技术细节，读这一篇就够了

笔者按：最近要做算法层面的HDR/SDR互相转换相关的内容，但是对HDR相关的内容真的没有一个很全面的认识。HDR转SDR还好，都是0~255的8bit图像；但是如果SDR转HDR…

人工智能 2023年6月17日
0091
CUDA 11.7+Win10+Pytorch安装

一、准备环境以及安装包记得将显卡的驱动升级到最新本文针对的CUDA Version为11.7版本的安装教程由于CUDA版本可以向下兼容，我们安装的官网的CUDA11.3的版本…

人工智能 2023年7月21日
0061
BeautifulSoup的基本使用

✅作者简介：大家好我是hacker707,大家可以叫我hacker📃个人主页：hacker707的csdn博客🔥系列专栏：python爬虫💬推荐一款模拟面试、刷题神器👉点击跳转进入…

人工智能 2023年7月4日
0087
CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0； 6.00 GiB total capacity；总结（1）

CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 6.00 GiB total capacity; 4.54 GiB …

人工智能 2023年7月21日
00219
基于docker的tensorflow2 bert 新闻分类模型部署

整体思路：1）使用tensorflow2加载预训练bert模型，进行训练，然后将模型部署载tfseving中。2）使用flask部署模型推理，模型推理时会requrest请求 1）…

人工智能 2023年7月2日
0080
【代码阅读】PL-VIO

〇、写在前面 PL-VIO采用的通信是ROS，所以并不能像ORBSLAM那样按照执行顺序来理顺，因为ORBSLAM是有一个真正意义上的主函数的，经过CMakeList的编辑产生的可…

人工智能 2023年6月19日
0074
Keras深度学习实战（32）——基于LSTM预测股价

Keras深度学习实战（32）——基于LSTM预测股价 * – 0. 前言 – 1. 模型与数据集分析 – + 1.1 数据集分析 + 1.2 …

人工智能 2023年6月29日
0074
机器学习强基计划1-2：图文详解线性回归与局部加权线性回归+房价预测实例

目录 0 写在前面 1 什么是线性回归？ 2 标准线性回归 3 局部加权线性回归 4 Python实现与可视化 * 4.1 标准线性回归 4.2 局部加权线性回归 ; 0 写在前面…

人工智能 2023年7月18日
0068
PDAF原理简介

1.PDAF原理原理：是在感光芯片上预留出一些规律性对称的遮蔽像素点，专门用来进行相位检测，通过像素之间的距离及变化来决定对焦的偏移量即相位差（PD值）从而实现快速对焦。 1.1…

人工智能 2023年6月24日
0074
人工智能在脑电情感分类上干了啥？(一)

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年6月19日
0072
人工智能赋能的数据管理、分析与系统专刊前言

大数据时代,数据规模庞大,数据管理应用场景复杂,传统数据库和数据管理技术面临很大的挑战.人工智能技术因其强大的学习、推理、规划能力,为数据库系统提供了新的发展机遇.专刊强调数据管理…

人工智能 2023年7月18日
0041
猿创征文｜深度学习基于ResNet18网络完成图像分类

一．前言本次任务是利用ResNet18网络实践更通用的图像分类任务。 ResNet系列网络，图像分类领域的知名算法，经久不衰，历久弥新，直到今天依旧具有广泛的研究意义和应用场景。…

人工智能 2023年6月22日
0093
Python：如何实现提取文本关键词、摘要、短语、无监督文本聚类

我们在使用Python对文本数据进行处理时，通常会遇到提取文本关键词、提取摘要、提取短语或者进行无监督文本聚类等需求。本文将向大家推荐一个非常实用的包pyhanlp，使用这个包中的…

人工智能 2023年6月19日
0068
保姆级官方yolov7的训练自己的数据集以及项目部署

yolov7 训练自己的数据集并部署第一步数据集准备第二步 train.py载入自己的数据集并训练第三步将训练好的pt文件做成接口调用 * 第一步数据集准备第二步 t…

人工智能 2023年7月26日
0067

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31