记录yolov5更改backbone为ShuffleBlock网络迁移训练出错问题以及解决方法

前言:最近在学习yolov5,记录一些报错

Sizes of tensors must match except in dimension 1. Expected size 16 but got size 8 for tensor number 1 in the list.

报错信息如下:

Traceback (most recent call last):
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 1, in <module>
    runfile('I:/GraduationProject/yolov5-5.0-sniperitf798/train.py', wdir='I:/GraduationProject/yolov5-5.0-sniperitf798')
  File "I:\DevSoftware\Python\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)
  File "I:\DevSoftware\Python\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "I:/GraduationProject/yolov5-5.0-sniperitf798/train.py", line 543, in <module>
    train(hyp, opt, device, tb_writer)
  File "I:/GraduationProject/yolov5-5.0-sniperitf798/train.py", line 88, in train
    model = Model(opt.cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\models\yolo.py", line 93, in __init__
    m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))])
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\models\yolo.py", line 123, in forward
    return self.forward_once(x, profile)
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\models\yolo.py", line 139, in forward_once
    x = m(x)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\models\common.py", line 210, in forward
    return torch.cat(x, self.d)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 8 for tensor number 1 in the list.

初步估计是模型网络结构出了问题。
下面是报错的网络结构:

backbone:

  [[ -1, 1, conv_bn_relu_maxpool, [ 32 ] ],
   [ -1, 1, Shuffle_Block, [ 128, 2 ] ],
   [ -1, 3, Shuffle_Block, [ 128, 1 ] ],
   [ -1, 1, Shuffle_Block, [ 256, 2 ] ],
   [ -1, 7, Shuffle_Block, [ 256, 1 ] ],
   [ -1, 1, Shuffle_Block, [ 512, 2 ] ],
   [ -1, 3, Shuffle_Block, [ 512, 1 ] ],
   [ -1, 1, Shuffle_Block, [ 1024, 2 ] ],

  ]

head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],
   [-1, 3, C3, [512, False]],

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],
   [-1, 3, C3, [256, False]],

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],
   [-1, 3, C3, [512, False]],

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],
   [-1, 3, C3, [1024, False]],

   [[15, 18, 21], 1, Detect, [nc, anchors]],
  ]

解决方法:在backbone网络末尾加上了Conv,spp,c3网络


backbone:

  [[ -1, 1, conv_bn_relu_maxpool, [ 32 ] ],
   [ -1, 1, Shuffle_Block, [ 128, 2 ] ],
   [ -1, 3, Shuffle_Block, [ 128, 1 ] ],
   [ -1, 1, Shuffle_Block, [ 256, 2 ] ],
   [ -1, 7, Shuffle_Block, [ 256, 1 ] ],
   [ -1, 1, Shuffle_Block, [ 512, 2 ] ],
   [ -1, 3, Shuffle_Block, [ 512, 1 ] ],
   [ -1, 1, Conv, [ 1024, 3, 2 ] ],
   [ -1, 1, SPP, [ 1024, [ 5, 9, 13 ] ] ],
   [ -1, 3, C3, [ 1024, False ] ],
  ]

但是开始训练,又报新的错

train: Scanning 'VOC\labels\train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:11<00:00, 1491.29it/s]
train: New cache created: VOC\labels\train.cache
Traceback (most recent call last):
  File "", line 1, in <module>
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\train.py", line 12, in <module>
    import torch.distributed as dist
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\__init__.py", line 124, in <module>
    raise err
OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.

Traceback (most recent call last):
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 1, in <module>
    runfile('I:/GraduationProject/yolov5-5.0-sniperitf798/train.py', wdir='I:/GraduationProject/yolov5-5.0-sniperitf798')
  File "I:\DevSoftware\Python\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)
  File "I:\DevSoftware\Python\JetBrains\PyCharm 2020.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "I:/GraduationProject/yolov5-5.0-sniperitf798/train.py", line 545, in <module>
    train(hyp, opt, device, tb_writer)
  File "I:/GraduationProject/yolov5-5.0-sniperitf798/train.py", line 194, in train
    image_weights=opt.image_weights, quad=opt.quad, prefix=colorstr('train: '))
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\utils\datasets.py", line 84, in create_dataloader
    collate_fn=LoadImagesAndLabels.collate_fn4 if quad else LoadImagesAndLabels.collate_fn)
  File "I:\GraduationProject\yolov5-5.0-sniperitf798\utils\datasets.py", line 97, in __init__
    self.iterator = super().__iter__()
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
    w.start()
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "I:\EnvVariable\ML\Anaconda3\envs\pytorch\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

出现的原因

电脑在默认情况下没有给I盘分配虚拟内存,所以将Python装在I盘的,在跑程序的时候,没有分配虚拟内存,就会遇到上面的问题。所以,只要给I盘分派虚拟内存即可。如果Python安装在C盘,更改C盘的虚拟内存的值,调大些。

解决方法

解决上面一二问题开始出现新的问题,具体报错如下:

train: Scanning 'VOC\labels\train.cache' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:00<?, ?it/s]
OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Scanning images:   0%|          | 0/4952 [00:00<?, ?it/s]OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

val: Scanning 'VOC\labels\val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 4952/4952 [00:05<00:00, 846.03it/s]
val: New cache created: VOC\labels\val.cache
Plotting labels...

OMP: Error
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

解决方法:
在train.py开头添加以下代码:


import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

Original: https://blog.csdn.net/qq_43676817/article/details/124641058
Author: SniperitfCoder
Title: 记录yolov5更改backbone为ShuffleBlock网络迁移训练出错问题以及解决方法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/686894/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球