Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Compute instance GPU uninstalled on restart

bawolf
New Member

Hello!

My goal is to host a version of stable diffusion on Compute engine that I can use as a microservice to try to build apps with. Since I am a hobbyist, my plan was to power it up and down when I was actually using it to save a bit of money. I created this write up which describes my steps.

The first time I spin up the instance everything works as expected. I have to opt into installing the GPU, I follow the steps I outlined in the post and I can make requests to the compute instance.

However, when I restart the instance after shutting it down the docker image is unable to run because it doesn't have a driver for the GPU. I tried following the instructions from google for installing GPU drivers and sure enough `nvidia-smi` isn't installed.

 

 

bryantwolf@stable-diffusion-api:~$ sudo nvidia-smi
sudo: nvidia-smi: command not found

 

 

 So I follow those instructions and end up with:

 

 

bryantwolf@stable-diffusion-api:~$ sudo nvidia-smi
Tue Oct 11 17:59:54 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 495.46       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   56C    P0    29W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

 

 

 but when I try to run a docker image that requires GPU I end up with this error message

 

 

Running 'script/download-weights <my huggingface api key>' in Docker with the current directory mounted as a volume...
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ⅹ Docker is missing required device driver

 

 

Is there something about the restart process I don't understand that might help me mitigate this problem?

0 1 1,510