python使用多gpu问题

有人用过多GPU跑网络吗？torch.distributed.run命令，如下：

python -m torch.distributed.run --standalone --nnodes=1 --nproc_per_node 8 xxx.py

这里我使用了8块gpu，但是每次这么用都会出现如下warning，然后程序就暂停了没跑起来：

WARNING:__main__:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************

不知道是什么问题

DDP神坑，基本无解
换launch看看？
python -m torch.distributed.launch
或者看下这个

distributed training not starting · Issue #65121 · pytorch/pytorch · GitHub 🐛 Bug I'm using Huggingface's Transformers package, trying to start distributed training. But it is not starting as I run the distributed script. To Reproduce export CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per...

https://github.com/pytorch/pytorch/issues/65121

?
贴个总的你自己找一找吧
https://github.com/pytorch/pytorch/issues?q=Setting+OMP_NUM_THREADS+environment+variable+for+each+process+to+be+1+in+default%2C+

跑代码的时候出现上述问题，这是由于pytouch分布式训练的问题。别人的代码设置了多个GPU并行，然而你跑的时候只用了一个或者两个，这个参数需要指定。
python -m torch.distributed.launch --nproc_per_node=1 --master_port 88888 train.py
--nproc_per_node=1 这个1表示你实际的GPU数量，
--master_port 88888 这个表示端口，一般不用设置，或者随便设置一个数字就行。当出现
runtimeerror: address already in use
这时候加--master_port 12345 就行

文章：Python 查看GPU已经使用的显存中也许有你想要的答案，请看下吧