在Ubuntu20.04系统下安装了最新版本的docker desktop, nvidia-docker2,想要创建一个可以跑GUI的Docker Image. 因为该Image之前在WSL2测试过没有问题,所以我没有修改任何东西就在Ubuntu上rebuild了。Dockerfile 和 docker-compose.yml 如下:
FROM osrf/ros:melodic-desktop-full
SHELL ["/bin/bash", "-c"]
# Minimal setup
RUN echo "source /opt/ros/melodic/setup.bash" >> ~/.bashrc
RUN source ~/.bashrc
# Extra pkg installation after this!
services:
melodic:
build: .
image: melodic
command: roslaunch gazebo_ros empty_world.launch &&
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
- DISPLAY=${DISPLAY}
- NVIDIA_DRIVER_CAPABILITIES=all
- NVIDIA_VISIBLE_DEVICES=all
- QT_X11_NO_MITSHM=1
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
- ${PWD}/.Xauthority:/root/.Xauthority:rw
network_mode: "host“
然而,当我执行如下命令时:
docker compose up
系统给我的错误是
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
我的想法是,可能runtime并没有设置好,所以我跑了如下指令:
docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all ubuntu:22.04 nvidia-smi
系统给我的错误是
Error response from daemon: Unknown runtime specified nvidia
然后我按照官方给出的方法加runtime
sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
但给出的运行结果是:
INFO[2022-07-11T10:30:18.583217896-05:00] Starting up
INFO[2022-07-11T10:30:18.584301607-05:00] detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf
INFO[2022-07-11T10:30:18.585538257-05:00] parsed scheme: "unix" module=grpc
INFO[2022-07-11T10:30:18.585571148-05:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2022-07-11T10:30:18.585618515-05:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 }] } module=grpc
INFO[2022-07-11T10:30:18.585635177-05:00] ClientConn switching balancer to "pick_first" module=grpc
INFO[2022-07-11T10:30:18.586921837-05:00] parsed scheme: "unix" module=grpc
INFO[2022-07-11T10:30:18.586945960-05:00] scheme "unix" not registered, fallback to default scheme module=grpc
INFO[2022-07-11T10:30:18.586973034-05:00] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 }] } module=grpc
INFO[2022-07-11T10:30:18.586984481-05:00] ClientConn switching balancer to "pick_first" module=grpc
INFO[2022-07-11T10:30:18.595208030-05:00] [graphdriver] using prior storage driver: overlay2
failed to start daemon: error while opening volume store metadata database: timeout
现在我怀疑是不是因为我的电脑有两块NNIDIA RTX的原因。
ps:nvidia-smi指令能用
我最终还是希望能在电脑上用这个docker image,但是因为runtime的问题已经拖了很久,还请各位帮忙。提前谢过大家!
如果你安装了 nvidia-docker2
,你不应该再注册 runtime,因为 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#adding-the-nvidia-runtime 官方说 nvidia-docker2
已经注册了,所以最好别再手动增加。
docker compose 运行的时候用户权限设置对了吗?先用下面命令看看什么反应,看看 runtime 的问题:
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi