【pytorch系列】torch.nn.DataParallel用法详解

2023年7月22日下午6:12 • 人工智能 • 阅读 235

在多卡的GPU服务器，当我们在上面跑程序的时候，当迭代次数或者epoch足够大的时候，我们通常会使用nn.DataParallel函数来用多个GPU来加速训练。一般我们会在代码中加入以下这些：

model = model.cuda()
device_ids = [0, 1]     # id为0和1的两块显卡
model = torch.nn.DataParallel(model, device_ids=device_ids)

device_ids = [0, 1]
model = torch.nn.DataParallel(model, device_ids=device_ids).cuda()

函数定义：

Parameters 参数：
module即表示你定义的模型；
device_ids表示你训练的device；
output_device这个参数表示输出结果的device；
而这最后一个参数output_device一般情况下是省略不写的，那么默认就是在device_ids[0]，也就是第一块卡上， 也就解释了为什么第一块卡的显存会占用的比其他卡要更多一些。

torch.nn.DataParallel源码解读：

class DataParallel(Module):

    def __init__(self, module, device_ids=None, output_device=None, dim=0):
        super(DataParallel, self).__init__()

        device_type = _get_available_device_type()
        if device_type is None:
            self.module = module
            self.device_ids = []
            return

        if device_ids is None:
            device_ids = _get_all_device_indices()

        if output_device is None:
            output_device = device_ids[0]

        self.dim = dim
        self.module = module
        self.device_ids = list(map(lambda x: _get_device_index(x, True), device_ids))
        self.output_device = _get_device_index(output_device, True)
        self.src_device_obj = torch.device(device_type, self.device_ids[0])

        _check_balance(self.device_ids)

        if len(self.device_ids) == 1:
            self.module.to(self.src_device_obj)

    def forward(self, *inputs, **kwargs):
        if not self.device_ids:
            return self.module(*inputs, **kwargs)

        for t in chain(self.module.parameters(), self.module.buffers()):
            if t.device != self.src_device_obj:
                raise RuntimeError("module must have its parameters and buffers "
                                   "on device {} (device_ids[0]) but found one of "
                                   "them on device: {}".format(self.src_device_obj, t.device))

        inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
        if len(self.device_ids) == 1:
            return self.module(*inputs[0], **kwargs[0])
        replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
        outputs = self.parallel_apply(replicas, inputs, kwargs)
        return self.gather(outputs, self.output_device)

    def replicate(self, module, device_ids):
        return replicate(module, device_ids, not torch.is_grad_enabled())

    def scatter(self, inputs, kwargs, device_ids):
        return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)

    def parallel_apply(self, replicas, inputs, kwargs):
        return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])

    def gather(self, outputs, output_device):
        return gather(outputs, output_device, dim=self.dim)

[docs]def data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None):
    r"""Evaluates module(input) in parallel across the GPUs given in device_ids.

    This is the functional version of the DataParallel module.

    Args:
        module (Module): the module to evaluate in parallel
        inputs (Tensor): inputs to the module
        device_ids (list of int or torch.device): GPU ids on which to replicate module
        output_device (list of int or torch.device): GPU location of the output  Use -1 to indicate the CPU.

            (default: device_ids[0])
    Returns:
        a Tensor containing the result of module(input) located on
        output_device
"""
    if not isinstance(inputs, tuple):
        inputs = (inputs,)

    device_type = _get_available_device_type()

    if device_ids is None:
        device_ids = _get_all_device_indices()

    if output_device is None:
        output_device = device_ids[0]

    device_ids = list(map(lambda x: _get_device_index(x, True), device_ids))
    output_device = _get_device_index(output_device, True)
    src_device_obj = torch.device(device_type, device_ids[0])

    for t in chain(module.parameters(), module.buffers()):
        if t.device != src_device_obj:
            raise RuntimeError("module must have its parameters and buffers "
                               "on device {} (device_ids[0]) but found one of "
                               "them on device: {}".format(src_device_obj, t.device))

    inputs, module_kwargs = scatter_kwargs(inputs, module_kwargs, device_ids, dim)
    if len(device_ids) == 1:
        return module(*inputs[0], **module_kwargs[0])
    used_device_ids = device_ids[:len(inputs)]
    replicas = replicate(module, used_device_ids)
    outputs = parallel_apply(replicas, inputs, module_kwargs, used_device_ids)
    return gather(outputs, output_device, dim)

Original: https://blog.csdn.net/sazass/article/details/116615028
Author: 大黑山修道
Title: 【pytorch系列】torch.nn.DataParallel用法详解

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/709410/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

plt.函数

1 plt.figure () ：创建画布 plt.figure(num=None, figsize=None, facecolor=None, edgecolor=None, c…

人工智能 2023年7月30日
0052
【数模之数据分析-1】

对于ndarray结构老说&#xFF0C…

人工智能 2023年7月14日
0073
ImportError: cannot import name ‘InterpolationMode‘ from ‘torchvision.transforms‘踩坑记录

在运行resNeSt代码的时候，有一个报错。 ImportError: cannot import name ‘InterpolationMode’ fro…

人工智能 2023年7月22日
0072
Pytorch使用DDP加载模型时出现多进程在GPU0上占用过多显存的问题

使用pytorch DDP(DistributedDataParallel，分布式数据并行)可以进行多卡训练，涉及到模型保存与加载问题时，一般会涉及到以下两种需求：将多卡训练的模…

人工智能 2023年7月22日
0043
Few-Shot Knowledge Graph Completion阅读笔记

Few-Shot Knowledge Graph Completion 原文下载AAAI 2020 摘要知识图谱作为多种自然语言处理应用的有用资源。原来知识图谱的补全方法要求每种…

人工智能 2023年6月1日
0060
NNDL 实验六卷积神经网络（5）使用预训练resnet18实现CIFAR-10分类

文章目录 * – 5.5 实践：基于ResNet18网络完成图像分类任务 – + 5.5.1 数据处理 + * 5.5.1.1 数据集介绍 * 5.5.1….

人工智能 2023年6月26日
0078
计算机视觉——sift特征匹配+opencv（包含sift的低于3.4.3的opencv安装方法）

文章目录 * – 实验环境 – 环境配置（低于3.4.3的opencv安装方法） – 虚拟环境的搭建（非必要） – 局部图像描述子 …

人工智能 2023年7月19日
0056
Python 人脸识别系统

简介人脸识别不同于人脸检测。在人脸检测中，我们只检测了人脸的位置，在人脸识别任务中，我们识别了人的身份。本文重点介绍使用库 face_recognition 实现人脸识别，该库…

人工智能 2023年6月24日
0074
【NLP学习计划】万字吃透NER

👨‍🎓 博主介绍：大家好，我是可可卷，一个NLP领域的小小白~📕 文章介绍：命名实体识别，即Named Entity Recognition(NER)，在比如QA，text sum…

人工智能 2023年7月28日
0080
Unity3D（2021版）打包成webgl和前端vue交互

1.unity部分在assets目录的Plugins文件夹新建一个文档文字随便命名，后缀名改为xxxx.jslib 在里面写入这样一段代码 mergeInto(LibraryMa…

人工智能 2023年7月31日
0030
全网最全RuntimeError: CUDA error: out of memory解决方法

第一种情况如果这个报错后面跟了想要占用多少显存但是不够这样的字眼，如下：解决办法就很简单了：改小batchsize，batchsize砍半可以差不多省掉一半的显存推理阶段加上…

人工智能 2023年7月6日
0049
基础 | date_range时间序列–时间切片

👉 今天简单介绍date_range freq 几个参数实例函数语法： date_range(start=None, end=None, periods=None, freq=N…

人工智能 2023年7月7日
0091
[人脸算法]技术方向综述

01 人脸技术的应用人脸硬件产品：考勤支付，安防监控，医疗美容人脸软件产品：各类娱乐软件，如美图秀秀 02 人脸图像算法及其研究方向人脸检测核心算法目的：检测图像中是否存在人…

人工智能 2023年6月6日
0080
如何利用Python程序读取Excel创建折线图

如何利用Python程序读取Excel创建折线图如何利用Python程序读取Excel创建折线图 * 首先 – 第一步：打开命令提示符第二步：通过pip命令安装所需…

人工智能 2023年7月4日
0099
识别硬币和细胞数量+条形码检测(python+opencv)

如愿一、准备工作二、硬币和细胞数量识别三、条形码定位和识别四、总结五、参考资料一、准备工作所用图片 python版本以及opencv版本 python 3.8.12 …

人工智能 2023年7月19日
0054
Windows cmd 命令

1.cd命令 //进入d盘 //进入F盘 cd /? //获取使用帮助 cd \ //跳转到硬盘的根目录 cd C:\WINDOWS //跳转到当前硬盘的其他文件 cd /d e:…

人工智能 2023年7月30日
0044

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

【pytorch系列】torch.nn.DataParallel用法详解

torch.nn.DataParallel源码解读：

大家都在看