Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Frequent NVIDIA drivers reinstall needed on boot

Hi. I have a couple of Compute Engine instances that make use of a GPU and were provisioned with the Deep Learning ML images. They work fine most of the time.

But sometimes, after restart, the NVIDIA drivers won't load. And I must manually reinstall them following the usual instructions. Now, reinstalling them always fixes the problem. But the challenge is that I cannot automate the boot the start-stop of the machine, because every time I boot it up is a lottery. I never know when is it gonna fail.

  • Did any of you experience a similar issue?
  • Is there away to make this problem go away?

Thanks.

0 1 227
1 REPLY 1

Hi @ManuelMeterian,

Welcome to Google Cloud Community!

Here are some guide to troubleshoot NVIDIA drivers connected to your issues:

You may also check this document for best practices in building reliable systems on Compute Engine. It offers general tips and explains features that can help reduce downtime and handle unexpected VM failures.

I hope the above information is helpful.