Hello!
My goal is to host a version of stable diffusion on Compute engine that I can use as a microservice to try to build apps with. Since I am a hobbyist, my plan was to power it up and down when I was actually using it to save a bit of money. I created this write up which describes my steps.
The first time I spin up the instance everything works as expected. I have to opt into installing the GPU, I follow the steps I outlined in the post and I can make requests to the compute instance.
However, when I restart the instance after shutting it down the docker image is unable to run because it doesn't have a driver for the GPU. I tried following the instructions from google for installing GPU drivers and sure enough `nvidia-smi` isn't installed.
bryantwolf@stable-diffusion-api:~$ sudo nvidia-smi
sudo: nvidia-smi: command not found
So I follow those instructions and end up with:
bryantwolf@stable-diffusion-api:~$ sudo nvidia-smi
Tue Oct 11 17:59:54 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46 Driver Version: 495.46 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 56C P0 29W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
but when I try to run a docker image that requires GPU I end up with this error message
Running 'script/download-weights <my huggingface api key>' in Docker with the current directory mounted as a volume...
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ⅹ Docker is missing required device driver
Is there something about the restart process I don't understand that might help me mitigate this problem?