Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Unable to install drivers on Compute Engine VM Instance

I spun up a new Compute Engine VM Instance (n1-standard-4 machine w 2 T4 GPUs) and tried to install NVIDIA drivers to use the GPUs on the new instance following the instructions here: https://cloud.google.com/compute/docs/gpus/install-drivers-gpu

Configuration: Linux Debian x86_64 version 12

When I tried running the utility script provided by GCP

install_gpu_driver.py  

 I received the following error:

ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock' 
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'

I received the same error when using the CUDA Toolkit to install the NVIDIA drivers via the CUDA Toolkit

wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_lin...
sudo sh cuda_12.3.2_545.23.08_linux.run

From https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible... it seems like there's an issue with the latest NVIDIA drivers for this kernel version which will be patched in a future release. Until then, how can I get NVIDIA drivers installed on a VM Instance to unblock me from training w GPUs on Google Cloud Platform? Can I configure a new VM Instance on an older kernel version?

 

 

1 4 2,174
4 REPLIES 4

Hi @mtam2013,

Welcome to the Google Cloud Community!

If I got your question right, in the same document that you linked, you can utilize Deep Learning images for your VM since they come with NVIDIA Drivers pre-installed [1]:

Alternatively, you can skip this setup by creating VMs with Deep Learning VM images. Deep Learning VM images have NVIDIA drivers pre-installed, and also include other machine learning applications such as TensorFlow and PyTorch.

You can view these documents for more information:

I hope this helps. Thank you.

[1]. https://cloud.google.com/compute/docs/gpus/install-drivers-gpu

You could downgrade the kernel and then set the VM to boot in to older kernel version. I have got this working using below : 

 

# check the current kernel version if its the below version then you need to have the older version installed with headers

1 uname -r
linux-image-6.1.0-18-cloud-amd64
2 apt-cache search linux | grep -P "^linux-image[^ ]*[0-9]\-amd64 "
3 apt-cache search linux | grep -P "^linux-image[^ ]*[0-9]\-cloud-amd64 "
4 sudo apt install -t linux-image-6.1.0-17-cloud-amd64
5 sudo apt install linux-image-6.1.0-17-cloud-amd64
6 apt-cache search linux | grep -P "^linux-header[^ ]*[0-9]\-cloud-amd64 "
7 sudo apt install linux-headers-6.1.0-17-cloud-amd64
# set the Grub loader to boot older version
8 sudo grub-reboot '1>2'
9 sudo reboot
10 uame -r
12 sudo python3 install_gpu_driver.py
13 sudo vim /etc/default/grub
# set the value in the above file to boot older kernel as default every time
GRUB_DEFAULT="1>2"
15 nvidia-smi

 

 

I had the same error, you can change the driver version as per compatibility guidelines in https://cloud.google.com/compute/docs/gpus/install-drivers-gpu and looking up the available drivers options via gsutil ls gs://nvidia-drivers-us-public/tesla/.
Once you found a driver you want to change to edit the driver version file vi /opt/deeplearning/driver-version.sh you might need sudo to edit it (sudo vi /opt/deeplearning/driver-version.sh). and rerun the installer sudo /opt/deeplearning/install-driver.sh. 535.183.01 worked for me

Thanks, 535.183.01 worked for me as well on an N1 machine with 4x P100 GPUs.