ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
单个GPU跑多GPU训练程序报的错误,解决方式就是os.environ['CUDA_VISIBLE_DEVICES']=[0,1]
·
ERROR:torch.distributed.elastic.multiprocessing.api:failed
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
单个GPU跑多GPU训练程序报的错误,解决方式就是os.environ['CUDA_VISIBLE_DEVICES'] =[0,1]
更多推荐
已为社区贡献1条内容
所有评论(0)