Jupter torch.multiprocessing.spawn()报错:Exception: process 0 terminated with exit code 1

详细报错:

~/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py in spawn(fn, args, nprocs, join, daemon, start_method)
    198                ' torch.multiprocessing.start_process(...)' % start_method)
    199         warnings.warn(msg)
--> 200     return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')

~/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py in start_processes(fn, args, nprocs, join, daemon, start_method)
    156
    157     # Loop on join until it returns True or raises an exception.

--> 158     while not context.join():
    159         pass
    160

~/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py in join(self, timeout)
    111                 raise Exception(
    112                     "process %d terminated with exit code %d" %
--> 113                     (error_index, exitcode)
    114                 )
    115

Exception: process 0 terminated with exit code 1

一、Jupyter平台的原因

看到其他回答,回答是可能是jupyter平台的原因。确实如此,用如下代码在terminal下放在py文件中可以运行,但是在jupyter下无法运行,报错信息如下:


import numpy as np
import torch
from torch.multiprocessing import Pool, set_start_method, spawn

X = np.array([[1, 3, 2, 3], [2, 3, 5, 6], [1, 2, 3, 4]])
X = torch.DoubleTensor(X)

def X_power_func(j):

    X_power = X**j

    return X_power

if __name__ == '__main__':
    spawn(X_power_func, nprocs=3)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'X_power_func' on <module '__main__' (built-in)>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'X_power_func' on <module '__main__' (built-in)>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'X_power_func' on <module '__main__' (built-in)>
     18
     19

~/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py in spawn(fn, args, nprocs, join, daemon, start_method)
    198                ' torch.multiprocessing.start_process(...)' % start_method)
    199         warnings.warn(msg)
--> 200     return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')

~/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py in start_processes(fn, args, nprocs, join, daemon, start_method)
    156
    157     # Loop on join until it returns True or raises an exception.

--> 158     while not context.join():
    159         pass
    160

~/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py in join(self, timeout)
    111                 raise Exception(
    112                     "process %d terminated with exit code %d" %
--> 113                     (error_index, exitcode)
    114                 )
    115

Exception: process 1 terminated with exit code 1

解决方法: create a separate file for func
参考链接:

二、仍然可能出现的错误

但是我决定直接在terminal下运行,不在jupyter下运行,避开jupyter没有create a separate file for func带来的报错,得到如下结果

代码上仍然出现的错误:

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Traceback (most recent call last):
  File "test_main_ddp.py", line 543, in <module>
    mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
  File "/root/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/root/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/root/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 113, in join
    (error_index, exitcode)
Exception: process 0 terminated with exit code 1
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
    if __name__=="__main__":
        mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))

出错原因:因为多线程程序要放在主函数中训练。这样就不报 Exception: process 0 terminated with exit code 1的错了。

Original: https://blog.csdn.net/goodgoodstudy___/article/details/123389191
Author: study_&
Title: Jupter torch.multiprocessing.spawn()报错:Exception: process 0 terminated with exit code 1

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/628388/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球