2022-12-06 19:39:04.180755: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-12-06 19:39:04.204065: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-12-06 19:39:04.204130: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: zml-Lenovo-Legion-R7000P2021H
2022-12-06 19:39:04.204141: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: zml-Lenovo-Legion-R7000P2021H
2022-12-06 19:39:04.204261: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 525.60.11
2022-12-06 19:39:04.204289: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 525.60.11
2022-12-06 19:39:04.204298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 525.60.11
2022-12-06 19:39:04.204614: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-06 19:39:04.231014: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3193875000 Hz
2022-12-06 19:39:04.232747: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x20a90c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-12-06 19:39:04.232785: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
如果使用tf.test.is_gpu_available()测试结果为true,说明tensorflow-gpu可以正常使用GPU。如果在实际的模型训练过程中仍然无法使用GPU,可能的原因有以下几点:
你这个报错是没有初始化的报错,先排除一下显存爆掉的问题,也就是batch size或者输入设置小一些看下能不能跑。第二种就是是否部分没有转到GPU,看下代码中的模型,输入等有无使用.cuda()转到GPU模式。
第三就是兼容性问题,你的cuda版本多少?
最后就是权限的问题,你看下sudo给个管理员权限运行看看。