可以分享一下command吗?这个问题我也遇到了一直解决不了。。
Warning 3. Unable to initialize backend ‘tpu_driver’: NOT_FOUND: Unable to find driver in registry given worker: 2022-03-11 11:35:11.
UserWarning: Flag --use_gpu_relax has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line! warnings.warn( I0311 11:35:09.811530 140075759966016 templates.py:857]
Using precomputed obsolete pdbs /share/pub/zhaohq/project/pumch/alphafold/version1/source/databases//pdb_mmcif/obsolete.dat. I0311 11:35:10.234684 140075759966016 xla_bridge.py:247]
Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 2022-03-11 11:35:11.258894: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:271]
failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected I0311 11:35:11.259597 140075759966016 xla_bridge.py:247]
Unable to initialize backend 'gpu': FAILED_PRECONDITION: No visible GPU devices. I0311 11:35:11.260360 140075759966016 xla_bridge.py:247]
Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available. W0311 11:35:11.260569 140075759966016 xla_bridge.py:252]
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) I0311 11:35:15.515949 140075759966016 run_alphafold.py:397]
Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'] I0311 11:35:15.516198 140075759966016 run_alphafold.py:414] Using random seed 1114105159714920706 for the data pipeline I0311 11:35:15.516445 140075759966016 run_alphafold.py:163]
Predicting query
这个问题只是显示我没有GPU,但是我的确已经在服务器的gpu节点running了,所以出现这个warning,我还是有很多疑问的,所以又舔着脸去deepmind上留言提问了,不过author一直没给我reply。虽然用CPU程序也可以正常运行,但是时间效率太低了,一直以来我都是能用GPU就用GPU,能多线程跑数据坚决不单线程的,太慢了。但是能运行了就没再计较,毕竟之前用docker装,出现各种让我看不懂的error,我现在明白,有error不可怕,可怕的是出现的error看不懂。
不过巧了,昨天在宿舍办公,下午吃完晚饭想和舍友溜出去消食,但是我当时的task没有提交到节点运行,我一走没有网,我跑的数据就全泡汤了,写好了任务提交节点的脚本,但是在GPU节点提交不上,找服务器师兄,他在群里给我也给大家说以后提交task不用写那么复杂的脚本了,正好,结果用他给的command,好像的确用到GPU了,因为今天看到昨天的error中没有之前的warning了!咱也不知道为啥,能解决而一个是一个吧!