【李宏毅】深度学习——作业1-Covid-19(Regression)

任务描述

目标: 预测COVID-19
给出过去三天在美国的一些州的统计的一些人的资料和阳性的比例(无第三天的),预测第三天阳性的比例

【李宏毅】深度学习——作业1-Covid-19(Regression)

这些统计信息包括40个州,每个州都用一个one-hot向量编码,并且给出了这些人的一些基本状况,比如是否生病( 4个),行为上( 8个),精神健康上( 5个),texted_positive( 1个,这个也是我们要预测的值),所以每个样例有18个特征值。

; Data–Training

train.csv(2700samples, 每个州68个,40个州)

【李宏毅】深度学习——作业1-Covid-19(Regression)

Data–Testing

test.csv(893samples, 最后一天是没有阳性比例的,需要预测)

【李宏毅】深度学习——作业1-Covid-19(Regression)

; 数据集

分析数据

  • 训练集一共2700行,95列,每一行是一个one-hot向量和三天的特征,one-hot向量维度40,每天十八个特征值,所以是 18*3+40 = 94,还有一列是id
  • 测试集一共894行,94列,每一行是一个one-hot向量和三天的特征,one-hot向量维度40,前两天十八个特征值,第三天17个特征值,所以是 18*2+17+40 = 93,还有一列是id
  • *每一行都是一个sample

特征提取

对于原始数据,我们应该删除第一行第一列,所以训练集剩下 93列,测试集 92列训练集的最后一列是我们要预测的属性, 作为target,将训练集拿出1/10作为验证集。对于one-hot向量做一个正则化处理。

定义自己的数据集

  • 从train.csv读取训练数据到内存
  • 提取特征
  • 将数据集分为训练和验证集
  • 将特征正则化
class COVID19Dataset(Dataset):
    '''处理下载下来的数据集'''
    def __init__(self, path, mode='train', target_only=False):
        self.mode = mode

        with open(path, 'r') as fp:
            data = list(csv.reader(fp))
            data = np.array(data[1:])[:, 1:].astype(float)

        if not target_only:
            feats = list(range(93))
        else:
            pass

        if mode == 'test':
            data = data[:, feats]
            self.data = torch.FloatTensor(data)
        else:
            target = data[:, -1]
            data = data[:, feats]

            if mode == 'train':
                indices = [i for i in range(len(data)) if i % 10 != 0]
            elif mode =='dev':
                indices = [i for i in range(len(data)) if i % 10 == 0]

            self.data = torch.FloatTensor(data[indices])
            self.target = torch.FloatTensor(target[indices])

        self.data[:, 40:] = \
            (self.data[:, 40:] - self.data[:, 40:].mean(dim=0, keepdim=True)) \
            / self.data[:, 40:].std(dim=0, keepdim=True)

        self.dim = self.data.shape[1]

        print('Finished reading the {} set of COVID19 Dataset ({} samples found, each dim = {})'
              .format(mode, len(self.data), self.dim))

    def __getitem__(self, index):
        if self.mode in ['train', 'dev']:
            return self.data[index], self.target[index]
        else:
            return self.data[index]

    def __len__(self):
        return len(self.data)

定义DataLoader


def prep_dataloader(path, mode, batch_size, n_jobs=0, target_only=False):
    dataset = COVID19Dataset(path, mode=mode, target_only=target_only)
    dataLoader = DataLoader(dataset, batch_size, shuffle=(mode == 'train'), drop_last=False, num_workers=n_jobs, pin_memory=True)
    return dataLoader

定义我们的model

这里只用一个简单的两次全连接网络来实现模型,用MSELoss作为损失进行训练


class NeuralNet(nn.Module):
    def __init__(self, input_dim):
        super(NeuralNet, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 1)
        )
        self.criterion = nn.MSELoss(reduction='mean')
    def forward(self, x):
        return self.net(x).squeeze(1)

    def cal_loss(self, pred, target):
        return self.criterion(pred, target)

定义一些训练时需要的参数


device = get_device()
os.makedirs('models', exist_ok=True)
target_only = False

config = {
    'n_epochs' : 3000,
    'batch_size' : 270,
    'optimizer' : 'SGD',
    'optim_hparas' : {
        'lr' : 0.001,
        'momentum' : 0.9
    },
    'early_stop' : 200,
    'save_path' : 'models/model.pth'
}

验证

验证集可以用来评估模型在训练期间的好坏,也可以用来提前终止无用的训练


def dev(dv_set, model, device):
    model.eval()
    total_loss = 0
    for x, y in dv_set:
        x, y = x.to(device), y.to(device)
        with torch.no_grad():
            pred = model(x)
            mse_loss = model.cal_loss(pred, y)

        total_loss += mse_loss.detach().cpu().item() * len(x)
    return total_loss

训练


def train(tr_set, dv_set, model, config, device):
    n_epochs = config['n_epochs']

    optimizer = getattr(torch.optim, config['optimizer'])(model.parameters(), **config['optim_hparas'])
    min_mse = 1000
    loss_record = {'train':[], 'dev':[]}
    early_stop_cnt = 0
    epoch = 0
    while epoch < n_epochs:
        model.train()
        for x, y in tr_set:
            optimizer.zero_grad()
            x, y = x.to(device), y.to(device)
            pred = model(x)
            mse_loss = model.cal_loss(pred, y)
            mse_loss.backward()
            optimizer.step()
            loss_record['train'].append(mse_loss.detach().cpu().item())

        dev_mse = dev(dv_set, model, device)
        if dev_mse < min_mse:

            min_mse = dev_mse
            print('Saving model (epoch = {:4d}, loss = {:.4f})'
                  .format(epoch + 1, min_mse))
            torch.save(model.state_dict(), config['save_path'])
            early_stop_cnt = 0
        else :

            early_stop_cnt += 1

        epoch += 1
        loss_record['dev'].append(dev_mse)
        if early_stop_cnt > config['early_stop']:

            break

    print('Finished training after {} epochs'.format(epoch))
    return min_mse, loss_record

加载数据集和定义模型

tr_path = 'train.csv'
tt_path = 'test.csv'

tr_set = prep_dataloader(tr_path, 'train', batch_size=config['batch_size'], target_only=target_only)
dv_set = prep_dataloader(tr_path, 'dev', config['batch_size'], target_only)
tt_set = prep_dataloader(tt_path, 'test', config['batch_size'], target_only)

model = NeuralNet(tr_set.dataset.dim).to(device)

开始训练


model_loss, model_loss_record = train(tr_set, dv_set, model, config, device)

plot_learning_curve(model_loss_record, title="deep model")

【李宏毅】深度学习——作业1-Covid-19(Regression)

测试


def test(tt_set, model, device):
    model.eval()
    preds = []
    for x in tt_set:
        x = x.to(device)
        with torch.no_grad():
            pred = model(x)
            preds.append(pred.detach().cpu())
    preds = torch.cat(preds, dim=0).numpy()
    return preds

开始预测并保存预测结果

def save_pred(preds, file):
    ''' Save predictions to specified file '''
    print('Saving results to {}'.format(file))
    with open(file, 'w') as fp:
        writer = csv.writer(fp)
        writer.writerow(['id', 'tested_positive'])
        for i, p in enumerate(preds):
            writer.writerow([i, p])

preds = test(tt_set, model, device)
save_pred(preds, 'pred.csv')

参考文献
李宏毅老师的代码地址(GitHub)
Homework 1: COVID-19 Cases Prediction (Regression)

Original: https://blog.csdn.net/m0_51474171/article/details/127755733
Author: 头发没了还会再长
Title: 【李宏毅】深度学习——作业1-Covid-19(Regression)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/657886/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球