【李宏毅】深度学习——作业1-Covid-19(Regression)

2023年6月28日下午10:06 • 人工智能 • 阅读 95

任务描述

目标： 预测COVID-19
给出过去三天在美国的一些州的统计的一些人的资料和阳性的比例（无第三天的），预测第三天阳性的比例

这些统计信息包括40个州，每个州都用一个one-hot向量编码，并且给出了这些人的一些基本状况，比如是否生病（ 4个），行为上（ 8个），精神健康上（ 5个），texted_positive（ 1个，这个也是我们要预测的值）,所以每个样例有18个特征值。

; Data–Training

train.csv(2700samples, 每个州68个，40个州)

Data–Testing

test.csv（893samples， 最后一天是没有阳性比例的，需要预测）

; 数据集

分析数据

训练集一共2700行，95列，每一行是一个one-hot向量和三天的特征，one-hot向量维度40，每天十八个特征值，所以是 18*3+40 = 94，还有一列是id
测试集一共894行，94列，每一行是一个one-hot向量和三天的特征，one-hot向量维度40，前两天十八个特征值，第三天17个特征值，所以是 18*2+17+40 = 93，还有一列是id
*每一行都是一个sample

特征提取

对于原始数据，我们应该删除第一行第一列，所以训练集剩下 93列，测试集 92列， 训练集的最后一列是我们要预测的属性， 作为target，将训练集拿出1/10作为验证集。对于one-hot向量做一个正则化处理。

定义自己的数据集

从train.csv读取训练数据到内存
提取特征
将数据集分为训练和验证集
将特征正则化

class COVID19Dataset(Dataset):
    '''处理下载下来的数据集'''
    def __init__(self, path, mode='train', target_only=False):
        self.mode = mode

        with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1:])[:, 1:].astype(float)

        if not target_only:
            feats = list(range(93))
        else:
            pass

        if mode == 'test':
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
        else:
            target = data[:, -1]
            data = data[:, feats]

            if mode == 'train':
                indices = [i for i in range(len(data)) if i % 10 != 0]
            elif mode =='dev':
                indices = [i for i in range(len(data)) if i % 10 == 0]

            self.data = torch.FloatTensor(data[indices])
            self.target = torch.FloatTensor(target[indices])

        self.data[:, 40:] = \
            (self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) \
            / self.data[:, 40:].std(dim=0, keepdim=True)

        self.dim = self.data.shape[1]

        print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
              .format(mode, len(self.data), self.dim))

    def __getitem__(self, index):
        if self.mode in ['train', 'dev']:
            return self.data[index], self.target[index]
        else:
            return self.data[index]

    def __len__(self):
        return len(self.data)

定义DataLoader


def prep_dataloader(path, mode, batch_size, n_jobs=0, target_only=False):
    dataset = COVID19Dataset(path, mode=mode, target_only=target_only)
    dataLoader = DataLoader(dataset, batch_size, shuffle=(mode == 'train'), drop_last=False, num_workers=n_jobs, pin_memory=True)
    return dataLoader

定义我们的model

这里只用一个简单的两次全连接网络来实现模型，用MSELoss作为损失进行训练


class NeuralNet(nn.Module):
    def __init__(self, input_dim):
        super(NeuralNet, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )
        self.criterion = nn.MSELoss(reduction='mean')
    def forward(self, x):
        return self.net(x).squeeze(1)

    def cal_loss(self, pred, target):
        return self.criterion(pred, target)

定义一些训练时需要的参数


device = get_device()
os.makedirs('models', exist_ok=True)
target_only = False

config = {
    'n_epochs' : 3000,
    'batch_size' : 270,
    'optimizer' : 'SGD',
    'optim_hparas' : {
        'lr' : 0.001,
        'momentum' : 0.9
    },
    'early_stop' : 200,
    'save_path' : 'models/model.pth'
}

验证

验证集可以用来评估模型在训练期间的好坏，也可以用来提前终止无用的训练


def dev(dv_set, model, device):
    model.eval()
    total_loss = 0
    for x, y in dv_set:
        x, y = x.to(device), y.to(device)
        with torch.no_grad():
            pred = model(x)
            mse_loss = model.cal_loss(pred, y)

        total_loss += mse_loss.detach().cpu().item() * len(x)
    return total_loss

训练


def train(tr_set, dv_set, model, config, device):
    n_epochs = config['n_epochs']

    optimizer = getattr(torch.optim, config['optimizer'])(model.parameters(), **config['optim_hparas'])
    min_mse = 1000
    loss_record = {'train':[], 'dev':[]}
    early_stop_cnt = 0
    epoch = 0
    while epoch < n_epochs:
        model.train()
        for x, y in tr_set:
            optimizer.zero_grad()
            x, y = x.to(device), y.to(device)
            pred = model(x)
            mse_loss = model.cal_loss(pred, y)
            mse_loss.backward()
            optimizer.step()
            loss_record['train'].append(mse_loss.detach().cpu().item())

        dev_mse = dev(dv_set, model, device)
        if dev_mse < min_mse:

            min_mse = dev_mse
            print('Saving model (epoch = {:4d}, loss = {:.4f})'
                  .format(epoch + 1, min_mse))
            torch.save(model.state_dict(), config['save_path'])
            early_stop_cnt = 0
        else :

            early_stop_cnt += 1

        epoch += 1
        loss_record['dev'].append(dev_mse)
        if early_stop_cnt > config['early_stop']:

            break

    print('Finished training after {} epochs'.format(epoch))
    return min_mse, loss_record

加载数据集和定义模型

tr_path = 'train.csv'
tt_path = 'test.csv'

tr_set = prep_dataloader(tr_path, 'train', batch_size=config['batch_size'], target_only=target_only)
dv_set = prep_dataloader(tr_path, 'dev', config['batch_size'], target_only)
tt_set = prep_dataloader(tt_path, 'test', config['batch_size'], target_only)

model = NeuralNet(tr_set.dataset.dim).to(device)

开始训练


model_loss, model_loss_record = train(tr_set, dv_set, model, config, device)

plot_learning_curve(model_loss_record, title="deep model")

测试


def test(tt_set, model, device):
    model.eval()
    preds = []
    for x in tt_set:
        x = x.to(device)
        with torch.no_grad():
            pred = model(x)
            preds.append(pred.detach().cpu())
    preds = torch.cat(preds, dim=0).numpy()
    return preds

开始预测并保存预测结果

def save_pred(preds, file):
    ''' Save predictions to specified file '''
    print('Saving results to {}'.format(file))
    with open(file, 'w') as fp:
        writer = csv.writer(fp)
        writer.writerow(['id', 'tested_positive'])
        for i, p in enumerate(preds):
            writer.writerow([i, p])

preds = test(tt_set, model, device)
save_pred(preds, 'pred.csv')

参考文献
李宏毅老师的代码地址（GitHub）
Homework 1: COVID-19 Cases Prediction (Regression)

Original: https://blog.csdn.net/m0_51474171/article/details/127755733
Author: 头发没了还会再长
Title: 【李宏毅】深度学习——作业1-Covid-19(Regression)

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/657886/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

【量化】Backtrader量化回测平台入门2——编写一个简单的策略

编写简单策略上节介绍了Backtrader量化回测平台的安装和运行，这一节将编写简单的策略，使用Backtrader，用于策略研究模板的参考。数据来源选用具有ohlc格式的数…

人工智能 2023年7月7日
0072
YOLOv3目标检测算法——通俗易懂的解析

目录 YOLOv3目标检测算法 * 前沿一.YOLOv3 二.损失函数 YOLOv3目标检测算法前沿前两篇文章我们讲了下关于 YOLOv1和 YOLOv2的原理，有不懂的小伙…

人工智能 2023年7月9日
0080
Python—线性回归

目录 1.简单线性回归模型 2.多元线性回归模型 * 2.1 应用F检验法完成模型的显著性检验 2.2应用t检验法完成回归系数的显著性检验 3.基于回归模型识别异常点 4.含有离散…

人工智能 2023年6月18日
0068
【PyTorch】KNN实战之MNIST数据分类与归一化处理

KNN的算法实现首先创建演示数据集 import numpy as np import matplotlib.pyplot as plt def createDataSet():…

人工智能 2023年7月2日
0044
撸了 ChatGPT 的 Teams 机器人

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年7月31日
0066
Pandas中时间的处理方法

使用 pandas 中的 Timestamp 中的 floor() 处理时间 Pandas <strong>Timestamp.floor()</strong&g…

人工智能 2023年7月7日
0053
【Python】新华字典(bushi

🚀write in front🚀 🔎大家好，我是謓泽，希望你看完之后，能对你有所帮助，不足请指正！共同学习交流🔎🏅2021年度博客之星物联网与嵌入式开发TOP5～周榜63>…

人工智能 2023年7月6日
0073
过滤器的准确性如何衡量？有哪些评估方法和指标可用于评估和比较不同过滤器的性能

问题背景介绍在机器学习和数据挖掘领域中，过滤器被广泛应用于数据预处理、降噪、信息提取等任务中。过滤器的准确性是衡量其性能的重要指标之一。准确性的高低决定了过滤器对于输入数据的筛选…

人工智能 2024年1月5日
0052
OpenCV-白平衡(完美反射算法)

作者：翟天保Steven版权声明：著作权归作者所有，商业转载请联系作者获得授权，非商业转载请注明出处实现原理白平衡的意义在于，对在特定光源下拍摄时出现的偏色现象，通过加强对应的…

人工智能 2023年7月19日
0047
WAV格式文件分析

文章目录 WAV格式文件分析 * WAV格式简介 WAV格式组成 – RIFF Chunk Format Chunk Data Chunk + 8 bit 单声道 8 …

人工智能 2023年5月23日
0068
目标检测——评价指标mAP

mAP，不认识英文单词的可以看一哈： P：Precision，精度 AP：Average Precision，平均精度 mAP：mean Average Precision，平均精…

人工智能 2023年7月12日
0087
操作系统学习笔记5 | 用户级线程 && 内核级线程

在上一部分中，我们了解到操作系统实现多进程图像需要组织、切换、考虑进程之间的影响，组织就是用PCB的队列实现，用到了一些简单的数据结构知识。而本部分重点就是进程之间的切换。参考资…

人工智能 2023年6月4日
0089
三维重建——D2HC-RMVSNet网络详解

前言之前在做MVS的学习接触到了MVSNet，这钟基于深度学习的方法是目前的一个趋势，因此值得我花大功夫去研究一番。但是现在网上的资料很少，特别是中文的，能找到的一般就是MVSN…

人工智能 2023年7月14日
0069
【数据分析】业务分析之ABtest

A/B测试AB测试是为Web或App界面或流程制作两个（A/B）或多个（A/B/n）版本，在同一时间维度，分别让组成成分相同（相似）的访客群组（目标人群）随机的访问这些版本，收集各…

人工智能 2023年7月17日
0051
Pandas数据分析实战（1）——探索Chipotle快餐数据

Python在数据处理和准备一直做得很好，但在数据分析和建模方面就差一些。pandas帮助填补了这一空白，使您能够在Python中执行整个数据分析工作流程，而不必切换到更特定于领域…

人工智能 2023年7月7日
0064
espnet学习(1):AN4数据集语音识别

目录 step -1. 数据下载 step 0~2. Kaldi格式数据准备 * step0. 数据准备 step1. 特征提取 step2. 字典和Json数据准备目录说明 s…

人工智能 2023年5月27日
00109

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30