项目场景:
提示:这里简述项目相关背景:
Improved Diffusion 复现
问题描述
提示:这里描述项目中遇到的问题:
RuntimeError: Distributed package doesn’t have NCCL built in
File "D:\APP\Anaconda3\envs\diffusion\lib\site-packages\torch\distributed\distributed_c10d.py", line 602, in init_process_group default_pg = _new_process_group_helper( File "D:\APP\Anaconda3\envs\diffusion\lib\site-packages\torch\distributed\distributed_c10d.py", line 727, in _new_process_group_helper raise RuntimeError("Distributed package doesn't have NCCL " "built in")RuntimeError: Distributed package doesn't have NCCL built in
原因分析:
提示:这里填写问题的分析:
windows不支持NCCL backend
解决方案:
提示:这里填写该问题的具体解决方案:
找到问题代码所在位置
D:\APP\Anaconda3\envs\diffusion\lib\site-packages\torch\distributed\distributed_c10d.py
dist.init_process_group(backend=backend, init_method="env://")
更换为
dist.init_process_group(backend="gloo", init_method="env://")
即可
来源地址:https://blog.csdn.net/weixin_43940981/article/details/127423861