Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

GPU driver version error on google virtual machine

I got an error Failed to initialize NVML: Driver/library version mismatch on a cloud virtual machine for unknown reasons, the system was working normally then suddenly crashed and reported such an error, I'm very confused and don't know what is the cause, can someone with experience in this matter please help me, thanks 

mface_err2.png

Solved Solved
1 2 17.1K
1 ACCEPTED SOLUTION

Good day and welcome to the community!

The error you encountered “ Failed to initialize NVML: Driver/library version mismatch” happens when you have different versions of the NVIDIA driver installed in your system at the same time, or when the driver has been updated without restarting the machine afterward.

Here are troubleshooting steps  that can possibly resolve this issue:

  1. Check NVIDIA Driver Version: Run the command nvidia-smi to check the current version of your NVIDIA driver. If the command runs successfully, it will display the NVIDIA driver version along with other information about the GPU.
  2. Reinstall NVIDIA Driver: If there's a mismatch or conflict between NVIDIA driver versions, you might need to reinstall the NVIDIA driver. First remove the current driver using the command 

           sudo apt-get remove --purge nvidia-*, and then install the required version of the NVIDIA driver. 

      3. Install the NVIDIA driver: The next step is to install the NVIDIA driver. Here's how to install a specific                    version, say 460. (Replace "460" with the version of the NVIDIA driver that's compatible with your                       system and CUDA version)

             sudo apt-get install nvidia-driver-460

      4.Restart Your System: After reinstalling the NVIDIA driver, it's a good practice to restart your system. The             NVML library accesses the GPU status through the driver and the error can happen if the driver was                   updated while applications using the NVML library are still running.

Please note that it is recommended to install the NVIDIA driver version that matches the CUDA version you are using. If you install a driver that is not compatible with the CUDA version installed, it may lead to the same or similar error.

Here are helpful links for your case:

[1] https://help.ubuntu.com/community/BinaryDriverHowto/Nvidia

[2] https://docs.nvidia.com/cuda/index.html

View solution in original post

2 REPLIES 2

Good day and welcome to the community!

The error you encountered “ Failed to initialize NVML: Driver/library version mismatch” happens when you have different versions of the NVIDIA driver installed in your system at the same time, or when the driver has been updated without restarting the machine afterward.

Here are troubleshooting steps  that can possibly resolve this issue:

  1. Check NVIDIA Driver Version: Run the command nvidia-smi to check the current version of your NVIDIA driver. If the command runs successfully, it will display the NVIDIA driver version along with other information about the GPU.
  2. Reinstall NVIDIA Driver: If there's a mismatch or conflict between NVIDIA driver versions, you might need to reinstall the NVIDIA driver. First remove the current driver using the command 

           sudo apt-get remove --purge nvidia-*, and then install the required version of the NVIDIA driver. 

      3. Install the NVIDIA driver: The next step is to install the NVIDIA driver. Here's how to install a specific                    version, say 460. (Replace "460" with the version of the NVIDIA driver that's compatible with your                       system and CUDA version)

             sudo apt-get install nvidia-driver-460

      4.Restart Your System: After reinstalling the NVIDIA driver, it's a good practice to restart your system. The             NVML library accesses the GPU status through the driver and the error can happen if the driver was                   updated while applications using the NVML library are still running.

Please note that it is recommended to install the NVIDIA driver version that matches the CUDA version you are using. If you install a driver that is not compatible with the CUDA version installed, it may lead to the same or similar error.

Here are helpful links for your case:

[1] https://help.ubuntu.com/community/BinaryDriverHowto/Nvidia

[2] https://docs.nvidia.com/cuda/index.html

My service was working normally, then suddenly got such an error, is there a specific cause for it, is there some self-updating mechanism?