Pytorch Unet 复现

2023年5月26日下午2:49 • 人工智能 • 阅读 51

pytorch-unet

来源：https://github.com/milesial/Pytorch-UNet

前两天搞了一下图像分割，用了下unet。之前没怎么用过。复现了一下18年的une pytorch 版本，记录学习一下（//过了一年了来补充完善一下。。）

if __name__ == '__main__':
    args = get_args()

    logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    logging.info(f'Using device {device}')

    # Change here to adapt to your data
    # n_channels=3 for RGB images
    # n_classes is the number of probabilities you want to get per pixel
    # 如果是RGB图像 n_channels=3，如果是医学图像（大部分是灰度图）n_channels=1。n_classes是
    # 你要分割的类别数加1，比如你的前景有两类，n_classes = 3哦
    net = UNet(n_channels=3, n_classes=args.classes, bilinear=args.bilinear)

    logging.info(f'Network:\n'
                 f'\t{net.n_channels} input channels\n'
                 f'\t{net.n_classes} output channels (classes)\n'
                 f'\t{"Bilinear" if net.bilinear else "Transposed conv"} upscaling')

    if args.load:
        net.load_state_dict(torch.load(args.load, map_location=device))
        logging.info(f'Model loaded from {args.load}')

    net.to(device=device)
    try:
        train_net(net=net,
                  epochs=args.epochs,
                  batch_size=args.batch_size,
                  learning_rate=args.lr,
                  device=device,
                  img_scale=args.scale,
                  val_percent=args.val / 100,
                  amp=args.amp)
    except KeyboardInterrupt:
        torch.save(net.state_dict(), 'INTERRUPTED.pth')
        logging.info('Saved interrupt')
        raise

有一个定义的函数get_args():

def get_args():
    parser = argparse.ArgumentParser(description='Train the UNet on images and target masks')
    parser.add_argument('--epochs', '-e', metavar='E', type=int, default=5, help='Number of epochs')
    parser.add_argument('--batch-size', '-b', dest='batch_size', metavar='B', type=int, default=1, help='Batch size')
    parser.add_argument('--learning-rate', '-l', metavar='LR', type=float, default=1e-5,
                        help='Learning rate', dest='lr')
    parser.add_argument('--load', '-f', type=str, default=False, help='Load model from a .pth file')
    parser.add_argument('--scale', '-s', type=float, default=0.5, help='Downscaling factor of the images')
    parser.add_argument('--validation', '-v', dest='val', type=float, default=10.0,
                        help='Percent of the data that is used as validation (0-100)')
    parser.add_argument('--amp', action='store_true', default=False, help='Use mixed precision')
    parser.add_argument('--bilinear', action='store_true', default=False, help='Use bilinear upsampling')
    parser.add_argument('--classes', '-c', type=int, default=2, help='Number of classes')

    return parser.parse_args()

argparser主要有三个步骤：

Argumentparser()对象，将命令行解析成python数据类型所需要的全部信息。

2.add_argument()方法添加函数，主要定batchsize，lr，epochs，这些乱七八糟的东西，这样方便在命令行直接修改。

这个封装的函数相当于最终解析出来（parparse_args()）。

创建解析器 – 添加命令行参数-解析参数

主要这个函数其实就是干了三个事：1.通过arg parser设定需要的轮次，bs，学习率等。2. 设定输入图像，输出图像尺寸。 3. 是用cpu，一块gpu还是用多块gpu

创建数据集（这一步的作用主要是实现loading data 和 augmentation）

    # 1. Create dataset
    try:
        dataset = CarvanaDataset(dir_img, dir_mask, img_scale)
    except (AssertionError, RuntimeError):
        dataset = BasicDataset(dir_img, dir_mask, img_scale)

定义数据集的路径，mask的路径。可以在这做数据增强，下面举个例子是训练集的数据增强，一般都是用 transforms.Compose的方法，比如下图就是用了随机旋转，随即翻转，转成tensor，做标准化（如果想添加自己的数据增强方法就在transform里自己定义一个类，然后compose进来实现自定义数据增强）。

transform_train = transforms.Compose([
    transforms.RandomRotation(degrees=8),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean, std), ])

划分数据集（训练集，验证集，测试集，通过random_split对数据集进行一定比例的随即划分。

    # 2. Split into train / validation partitions
    n_val = int(len(dataset) * val_percent)
    n_train = len(dataset) - n_val
    train_set, val_set = random_split(dataset, [n_train, n_val], generator=torch.Generator().manual_seed(0))

创建dataloder

3. Create data loaders
    loader_args = dict(batch_size=batch_size, num_workers=4, pin_memory=True)
    train_loader = DataLoader(train_set, shuffle=True, **loader_args)
    val_loader = DataLoader(val_set, shuffle=False, drop_last=True, **loader_args)

首先简单介绍一下啥是dataLoader，它是PyTorch中数据读取的一个重要接口，该接口定义在dataloader.py中，该接口的目的：将自定义的Dataset根据batch size大小、是否shuffle等封装成一个一个Batch Size大小的Tensor，用于后面的训练。

例如：定义的train_loder继承了dataloder，用自己的train_set数据集，按batchsize分成一批一批的tensor去训练；shuffle是每一个epoch结束之后，是否要重新排序；num_worker这个参数决定了有几个进程来处理data loading。0意味着所有的数据都会被load进主进程。根据我的经验哈，一般如果出席那这些osa,显存太小这种报错，就把num_workers改成0，或者4就行了。一般gpu上跑还是8或者16差不多，这个是影响训练速度的，num_workers太小的话，gpu利用率会非常低，训练不好。（很多时候我在本地都是num = 0，然后放到服务器上忘了改num，直接显存就爆炸了。。）

创建优化器，定义学习率策略，定义损失函数

 # 4. Set up the optimizer, the loss, the learning rate scheduler and the loss scaling for AMP
    optimizer = optim.RMSprop(net.parameters(), lr=learning_rate, weight_decay=1e-8, momentum=0.9)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'max', patience=2)  # goal: maximize Dice score
    grad_scaler = torch.cuda.amp.GradScaler(enabled=amp)
    criterion = nn.CrossEntropyLoss()
    global_step = 0

optimizer能够保持当前参数状态并基于计算得到的梯度进行参数更新，可以继承网络初始参数，权重衰减，学习率策略啊一些东西。方法分为2大类： 一大类方法是SGD及其改进（加Momentum）；另外一大类是Per-parameter adaptive learning rate methods（逐参数适应学习率方法），包括AdaGrad、RMSProp、Adam等。这东西就跟机器学习当中选择什么算法来进行梯度更新一样。

我的经验：一般优化器就是SGD或者Adam; 学习率策略一般是 lr_ = base_lr * (1.0 – iter_num / max_iterations) ** 0.9 或者过多少轮减半什么的，初始学习率最多0.1，一般0.01或者0.001；bs是2，4，8，16。

5. 开始训练 begin

5. Begin training
    for epoch in range(1, epochs+1):
        net.train()
        epoch_loss = 0
        with tqdm(total=n_train, desc=f'Epoch {epoch}/{epochs}', unit='img') as pbar:
            for batch in train_loader:
                images = batch['image']
                true_masks = batch['mask']

                assert images.shape[1] == net.n_channels, \
                    f'Network has been defined with {net.n_channels} input channels, ' \
                    f'but loaded images have {images.shape[1]} channels. Please check that ' \
                    'the images are loaded correctly.'

                images = images.to(device=device, dtype=torch.float32)
                true_masks = true_masks.to(device=device, dtype=torch.long)

                with torch.cuda.amp.autocast(enabled=amp):
                    masks_pred = net(images)
                    loss = criterion(masks_pred, true_masks) \
                           + dice_loss(F.softmax(masks_pred, dim=1).float(),
                                       F.one_hot(true_masks, net.n_classes).permute(0, 3, 1, 2).float(),
                                       multiclass=True)

                optimizer.zero_grad(set_to_none=True)
                grad_scaler.scale(loss).backward()
                grad_scaler.step(optimizer)
                grad_scaler.update()

                pbar.update(images.shape[0])
                global_step += 1
                epoch_loss += loss.item()
                experiment.log({
                    'train loss': loss.item(),
                    'step': global_step,
                    'epoch': epoch
                })
                pbar.set_postfix(**{'loss (batch)': loss.item()})

在每一个epoch里，通过train_loder得到多少个batch，每个batch,每个batch训练。通过网络分割得到的masks_prede和传入的true_masks进行loss计算，在优化器内不断反向传播，更新梯度，更新优化器，使得loss越来越小，并且趋于稳定。

这里主要就是loss的选择了，一般就是交叉熵和dice_loss

计算

       val_score = evaluate(net, val_loader, device)
       scheduler.step(val_score)

       logging.info('Validation Dice score: {}'.format(val_score))

通过训练的网络对验证集图片进行预测，然后与验证集的true_masks进行比较得到精度。

保存权重

 if save_checkpoint:
            Path(dir_checkpoint).mkdir(parents=True, exist_ok=True)
            torch.save(net.state_dict(), str(dir_checkpoint / 'checkpoint_epoch{}.pth'.format(epoch + 1)))
            logging.info(f'Checkpoint {epoch + 1} saved!')

在这里可以将训练的每一轮参数保存下来，保存成pth文件，到时候在预测的时候直接用就可以了。

也就是训练当中所用到的net的结构是什么样子的

class UNet(nn.Module):
    def __init__(self, n_channels, n_classes, bilinear=True):
        super(UNet, self).__init__()
        self.n_channels = n_channels
        self.n_classes = n_classes
        self.bilinear = bilinear

        self.inc = DoubleConv(n_channels, 64)
        self.down1 = Down(64, 128)
        self.down2 = Down(128, 256)
        self.down3 = Down(256, 512)
        factor = 2 if bilinear else 1
        self.down4 = Down(512, 1024 // factor)
        self.up1 = Up(1024, 512 // factor, bilinear)
        self.up2 = Up(512, 256 // factor, bilinear)
        self.up3 = Up(256, 128 // factor, bilinear)
        self.up4 = Up(128, 64, bilinear)
        self.outc = OutConv(64, n_classes)

    def forward(self, x):
        x1 = self.inc(x)
        x2 = self.down1(x1)
        x3 = self.down2(x2)
        x4 = self.down3(x3)
        x5 = self.down4(x4)
        x = self.up1(x5, x4)
        x = self.up2(x, x3)
        x = self.up3(x, x2)
        x = self.up4(x, x1)
        logits = self.outc(x)
        return logits

结构比较简单，在encoder阶段，主要有几个模块

第一个模块就是doublecov(两个（conv2d,bn,relu）),主要用在最开始将三通道图片转化为64通道图片。照例子来说，每一次unet conv后，图片尺寸都会下降2，但是在代码中

out_size = （in_size – K + 2P）/ S +1

特意将大小设成3，padding设成1，stride设成1，这样在做conv的时候图片尺寸就不会发生变化了

第二个模块就是down模块（maxpool2d，doubleconv），每次将图片尺寸减半，并在池化后进行conv的操作，增加通道数。

在decoder阶段：

up模块（unsample）+conv 上采样将图片尺寸增加，conv将通道数减少，并且和endocder同层的特征图进行连接

最后outc模块，看你是想输出几通道的图片，就有几个卷积核就好了。

debug:

AssertionError: Either no mask or multiple masks found for the ID 0008052191_9: []

解决方案：找到了img_file路径，mask file路径找不到。在data_loading里将mask_suffix改为空，如果你的img和mask是一摸一样的名字的话。

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

解决方案：是分类标签越界的问题，最终mask是要分0，1的。项目当中数据集mask是0，1。但是我的image跟mask都是0-255，位深度24，所以原项目是只将img除了255，代码中只要将is_mask改成False就好了。

RuntimeError: 1only batches of spatial targets supported (3D tensors) but got targets of size: : [1, 1, 256, 256]

解决方案:这个报错比较明显，可以打印一下自己在求loss的时候的target跟ground truth,将图像尺寸reshape一下（比如:true_mask.reshape(1,256,256)就好了）。同样在验证集的时候也遇到这个问题，同样的解决方法自然就完成了。

wandb.errors.CommError: check_hostname requires server_hostname

解决方案：这个我也不大懂反正大概意思因为我开了翻墙软件，可能wandb那里出现了什么问题，把他关了就好了。

BrokenPipeError: [Errno 32] Broken pipe

解决方案：好像还有是说os：显存太小一类的，将num_workers改成0就好了。

Original: https://blog.csdn.net/kim_jisoo123/article/details/121650180
Author: kim_jisoo123
Title: Pytorch Unet 复现

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/520062/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

Transformer的理解与代码实现—Autoformer文献阅读

文章目录摘要一. 关于Transformer的相关学习 * 1.1 手推transformer – 1.1.1 Encoder部分 1.1.2 Decoder部分 …

人工智能 2023年7月21日
0063
[个人笔记]EME Solver自学笔记—参照lumerical官网视频

因为课题需要学习了几天lumerical公司MODE软件中的EME Solver，参考官网视频整理了学习笔记，因为作者水平有限并且刚开始该领域的研究，所以笔记难免有遗漏或者自己理解…

人工智能 2023年6月25日
0087
Linux中的DNS服务搭建与管理

目录一、DNS简介 1.概念 2.DNS域名空间结构 3.空间层次结构二、DNS工作类型 1.缓存域名服务器 2.主域名服务器 3.从域名服务器三、DNS工作原理四、DNS…

人工智能 2023年6月26日
00103
Python pandas包读取excel文件教程

Python读取Excel文件教程现在，我们将演示如何使用Python语言读取Excel文件内容。或许你会觉得这样读取Excel文件内容是非常困难的，事实并非如此，我们将逐步实现…

人工智能 2023年6月19日
00121
极智AI | 详解 ViT 算法实现

欢迎关注我的公&#…

人工智能 2023年5月26日
0062
元学习入门必备：MAML(背景+论文解读+代码分析)

文章目录前言背景 * 元学习简介元学习问题定义小样本学习(Few shot learning) – 问题定义元学习/小样本学习基本特征论文解读 * Abst…

人工智能 2023年7月21日
00103
程序人生 | 与足球共舞的火柴人（致敬格拉利什，赋予足球更深的意义）

个人简介 👀 个人主页：前端杂货铺🙋‍♂️ 学习方向：主攻前端方向，也会涉及到服务端📃 个人状态：在校大学生一枚，已拿多个前端 offer（秋招）🚀 未来打算：为中国的工业软…

人工智能 2023年7月29日
0073
协同过滤算法是如何进行用户间的相似度计算的

问题：协同过滤算法如何进行用户间的相似度计算？详细介绍：协同过滤算法是一种常用的推荐系统算法，其基本思想是通过分析用户的历史行为进行推荐，该算法根据用户之间的相似度来计算推荐的…

人工智能 2024年1月4日
0033
YOLOV7训练自己的数据集，我先来试试火（VisDrone数据集）

源码：https://github.com/WongKinYiu/yolov7论文：https://arxiv.org/abs/2207.02696 这个yolov7是yolov4…

人工智能 2023年5月26日
0085
目标检测的yolov3、4、5、6总结

背景描述 cv中的一个典型的问题就是目标检测，目标检测中的yolo模型是一个经典的目标检测模型，在各种目标检测的比赛和数据集中，表现都非常亮眼，借用这篇文章对yolov3、v4、v…

人工智能 2023年7月12日
0044
微信小程序框架部署：mpvue+typescript

开发前提： 1、在微信公众平台注册申请 AppID 2、安装开发者工具https://developers.weixin.qq.com/miniprogram/dev/devtoo…

人工智能 2023年6月6日
0066
特斯拉2021年自动驾驶，特斯拉自动驾驶技术专利

如果马斯克将技术开发给其他车企，这也意味着可以促使更多的车企使用这项技术，当然，成本也可能需要由其他品牌和消费者承担，而目前，特斯拉向订购FSD的车主收取1.2万美元（中国地区6….

人工智能 2023年7月14日
0048
三维重建（知识点详细解读、主要流程）

基于本人大创项目所学习三维建模过程的笔记。 1.概念：三维重建是指对三维物体建立适合计算机表示和处理的数学模型,是在计算机环境下对其进行处理、操作和分析其性质的基础,也是在计算机中…

人工智能 2023年7月25日
0084
SpringFramework 之EnableCaching

@Target(ElementType.TYPE) @Retention(RetentionPolicy.RUNTIME) @Documented @Import(CachingC…

人工智能 2023年7月20日
0057
安装Stanza(处理Stanza无法下载语言模型的错误：ConnectionError)

安装Stanza (Debug记录）处理Stanza无法下载语言模型的错误:ConnectionError 根据官方文档进行stanza初始安装 pip install stant…

人工智能 2023年5月27日
0072
基于seed数据集的脑电情绪识别（附论文和源码）（改进的循环神经网络（简单循环单元神经网络）和集成学习）并提取了微分熵、功率谱等特征

论文和源码链接见个人主页：基于seed数据集的脑电情绪识别（附论文和源码）（改进的循环神经网络和集成学习）并提取了微分熵、功率谱等特征。 https://download.csdn…

人工智能 2023年7月13日
0061

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Pytorch Unet 复现

大家都在看