RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCBlas.cu:331
OS: Linux
Pytorch: 1.2.0
CUDAToolkit: 10.0
allennlp: 0.9.0
NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
GPU RTX3090
2022-12-21 18:11:05,577 - INFO - allennlp.training.trainer - Training
0%| | 0/16148 [00:00call last):
File "/data/yutian/anaconda3/envs/py37_2/bin/allennlp", line 8, in
sys.exit(run())
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/run.py", line 18, in run
main(prog="allennlp")
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 102, in main
args.func(args)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 124, in train_model_from_args
args.cache_prefix)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 168, in train_model_from_file
cache_directory, cache_prefix)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/commands/train.py", line 252, in train_model
metrics = trainer.train()
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 478, in train
train_metrics = self._train_epoch(epoch)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 320, in _train_epoch
loss = self.batch_loss(batch_group, for_training=True)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/training/trainer.py", line 261, in batch_loss
output_dict = self.model(**batch)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "./model.py", line 187, in forward
joint_embedding = self.word_embedder(joint_tokens)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 118, in forward
token_vectors = embedder(*tensors, **forward_params_values)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/allennlp/modules/token_embedders/bert_token_embedder.py", line 175, in forward
attention_mask=util.combine_initial_dims(input_mask))
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 733, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 406, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 391, in forward
attention_output = self.attention(hidden_states, attention_mask)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 349, in forward
self_output = self.self(input_tensor, attention_mask)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/data/yutian/anaconda3/envs/py37_2/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 309, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/THCBlas.cu:331
0%| | 0/16148 [00:12
楼上一堆人都不看环境说明的吗
你这个问题很简单,就是30系显卡不支持cuda11.0以前的版本,但是你的cuda是10.0的,所以就会这样。驱动你已经是支持11.7的cuda了,所以不用管,你需要重新安装cuda(这个版本由你要安装的pytorch确定)和cudnn,然后安装对应的pytroch,torchvision和torchaudio。
如果你的低版本的pytroch无法兼容cuda11.0以上的版本,要么自己折腾下编译源码(坑多,不好搞,需要研究),要么升级下torch版本到有11.0以以上cuda的版本。我比较建议升级torch版本,因为pytorch版本之间的兼容性不错,基本上都不需要修改源码。
1、检查代码,确保没有存在bug的地方。
2、尝试重新安装CUDA和PyTorch,确保它们的版本是兼容的。
3、尝试更新你的显卡驱动程序。
4、尝试使用较旧版本的PyTorch或者CUDA。
ChatGPT尝试为您解答,仅供参考
这个错误通常是在使用 CUDA 时出现的,表示 GPU 上的程序执行时出错。
有几种可能的原因导致这个错误:
pytorch出现RuntimeError: cublas runtime error :cu:259问题,及解决方法
借鉴下
https://blog.csdn.net/weixin_41781121/article/details/109030372
初步怀疑是pytorch、cuda之间的版本匹配问题
解决方法:
先用pip安装
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
可以逐个下载:
https://download.pytorch.org/whl/cu113/torchaudio-0.12.1%2Bcu113-cp38-cp38-linux_x86_64.whl
https://download.pytorch.org/whl/cu113/torch-1.12.1%2Bcu113-cp38-cp38-linux_x86_64.whl
还有一个在这里找:
https://download.pytorch.org/whl/cu113/
找到 跟你 cuda 版本匹配的。
如果最后还是报错,就尝试直接从官网下载.whl文件重装pytorch.
CUDAToolkit和GPU不匹配
在更换pytorch 1.5.0 gpu+CUDA10.2之后,报错变为cudnn error。原因是没有安装cudnn。由于在公用服务器上无法安装cudnn,所以只能通过如下代码禁用:
import torch
torch.backends.cudnn.enabled = False
解决。