Hello!
My goal is to host a version of stable diffusion on Compute engine that I can use as a microservice to try to build apps with. Since I am a hobbyist, my plan was to power it up and down when I was actually using it to save a bit of money. I created this write up which describes my steps.
The first time I spin up the instance everything works as expected. I have to opt into installing the GPU, I follow the steps I outlined in the post and I can make requests to the compute instance.
However, when I restart the instance after shutting it down the docker image is unable to run because it doesn't have a driver for the GPU. I tried following the instructions from google for installing GPU drivers and sure enough `nvidia-smi` isn't installed.
bryantwolf@stable-diffusion-api:~$ sudo nvidia-smi
sudo: nvidia-smi: command not found
So I follow those instructions and end up with:
bryantwolf@stable-diffusion-api:~$ sudo nvidia-smi
Tue Oct 11 17:59:54 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46 Driver Version: 495.46 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 56C P0 29W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
but when I try to run a docker image that requires GPU I end up with this error message
Running 'script/download-weights <my huggingface api key>' in Docker with the current directory mounted as a volume...
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ⅹ Docker is missing required device driver
Is there something about the restart process I don't understand that might help me mitigate this problem?
Hello, bawolf,
After checking several cases, I found these cases that can be useful in solving the issue with the drivers. It seems that the issue is with the installation of the drivers, and it has been reported that these commands should help:
This is the correct way to install NVIDIA driver on a GCP instance:
cd /
sudo apt purge nvidia-*
Reboot
cd /
sudo wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
sudo sh cuda_11.2.2_460.32.03_linux.run
Adjust your config accordingly as it pops options in the terminal.
Reboot.
Another way of installing the drivers:
More information regarding these commands can be found in this stackoverflow case.
I found these other cases with similar problems that can be useful: