请问 想使用deepspeed进行单机多卡的finetune时,报错:
Traceback (most recent call last):
File "main.py", line 440, in <module>
main()
File "main.py", line 397, in main
perplexity = evaluation(model, eval_dataloader)
File "main.py", line 323, in evaluation
outputs = model(**batch)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1695, in forward
loss = self.module(*inputs, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 913, in forward
transformer_outputs = self.transformer(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 730, in forward
inputs_embeds = self.word_embeddings(input_ids)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 162, in forward
return F.embedding(
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2213, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
然后另一块卡报的就是
cuda:1 and cpu!
查了两天了,知道是因为一部分数据在cpu上,一部分在gpu上,但是哪一块在cpu上,我又怎么挪过去呢?
到最终报错的torch/nn/functional.py里,想打印input和weight在cpu还是gpu上:
print(input.untyped_storage())
print(weight.untyped_storage())
input显示torch.storage.UntypedStorage(device=cpu) of size 4096]
weight显示CUDA error: an illegal memory access was encountered
这看起来就好像是input在cpu里,
当我想通过input.cuda()把他放到gpu里时,又报错 CUDA error: an illegal memory access was encountered
所以他到底是在cpu还是gpu里呢
想问佬佬们我要怎么排查到底是哪一步这么操作呢
这个错误提示表明你的某些 tensors 不在同一个设备上,具体来说,是你的输入数据(input_ids 和 attention_mask)在 CPU 上,而模型参数在 GPU 上。这可能是因为在数据加载时没有正确设置数据和模型的设备。
为了解决此问题,你可以采用以下方法:
input_ids = input_ids.to(device)
attention_mask = attention_mask.to(device)
其中,device 是你要将数据移动的设备,可以使用 torch.device()
函数来指定。
检查你的数据加载和分布式训练设置是否正确。如果你的代码包含分布式训练,那么在初始化训练器(如 DeepSpeed 或 PyTorch Lightning)时,需要指定正确的分布式参数和设备参数。
当出现 CUDA 错误时,可以通过增加调试语句来确定错误发生的位置。你可以在产生错误的地方前后打印中间结果,找到最先出现错误的位置。在你的示例代码中,你可以尝试在 forward 函数中打印每个输入的设备信息:
for key in batch:
print(f"{key} device: {batch[key].device}")
outputs = model(**batch)
希望这些方法可以帮助你解决问题。AI作答