Hi. I have a couple of Compute Engine instances that make use of a GPU and were provisioned with the Deep Learning ML images. They work fine most of the time.
But sometimes, after restart, the NVIDIA drivers won't load. And I must manually reinstall them following the usual instructions. Now, reinstalling them always fixes the problem. But the challenge is that I cannot automate the boot the start-stop of the machine, because every time I boot it up is a lottery. I never know when is it gonna fail.
Thanks.