详细报错:
~/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py in spawn(fn, args, nprocs, join, daemon, start_method)
198 ' torch.multiprocessing.start_process(...)' % start_method)
199 warnings.warn(msg)
--> 200 return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
~/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py in start_processes(fn, args, nprocs, join, daemon, start_method)
156
157 # Loop on join until it returns True or raises an exception.
--> 158 while not context.join():
159 pass
160
~/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py in join(self, timeout)
111 raise Exception(
112 "process %d terminated with exit code %d" %
--> 113 (error_index, exitcode)
114 )
115
Exception: process 0 terminated with exit code 1
一、Jupyter平台的原因
看到其他回答,回答是可能是jupyter平台的原因。确实如此,用如下代码在terminal下放在py文件中可以运行,但是在jupyter下无法运行,报错信息如下:
import numpy as np
import torch
from torch.multiprocessing import Pool, set_start_method, spawn
X = np.array([[1, 3, 2, 3], [2, 3, 5, 6], [1, 2, 3, 4]])
X = torch.DoubleTensor(X)
def X_power_func(j):
X_power = X**j
return X_power
if __name__ == '__main__':
spawn(X_power_func, nprocs=3)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'X_power_func' on <module '__main__' (built-in)>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'X_power_func' on <module '__main__' (built-in)>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'X_power_func' on <module '__main__' (built-in)>
18
19
~/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py in spawn(fn, args, nprocs, join, daemon, start_method)
198 ' torch.multiprocessing.start_process(...)' % start_method)
199 warnings.warn(msg)
--> 200 return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
~/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py in start_processes(fn, args, nprocs, join, daemon, start_method)
156
157 # Loop on join until it returns True or raises an exception.
--> 158 while not context.join():
159 pass
160
~/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py in join(self, timeout)
111 raise Exception(
112 "process %d terminated with exit code %d" %
--> 113 (error_index, exitcode)
114 )
115
Exception: process 1 terminated with exit code 1
解决方法: create a separate file for func
参考链接:
二、仍然可能出现的错误
但是我决定直接在terminal下运行,不在jupyter下运行,避开jupyter没有create a separate file for func带来的报错,得到如下结果
代码上仍然出现的错误:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "test_main_ddp.py", line 543, in <module>
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/root/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/root/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 113, in join
(error_index, exitcode)
Exception: process 0 terminated with exit code 1
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
if __name__=="__main__":
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
出错原因:因为多线程程序要放在主函数中训练。这样就不报 Exception: process 0 terminated with exit code 1
的错了。
Original: https://blog.csdn.net/goodgoodstudy___/article/details/123389191
Author: study_&
Title: Jupter torch.multiprocessing.spawn()报错:Exception: process 0 terminated with exit code 1
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/628388/
转载文章受原作者版权保护。转载请注明原作者出处!