I am running a stable diffusion on google compute engine (debian vm with an nvidia t4). The code works just fine on my local pc but on the google computer machine it just dies of a segmentation fault after loading the diffusion pipeline as soon as image generation starts. I ended up using faulthandler to look at the stack but I am still lost. I should have enough gpu and cpu power to run this but it just explodes. My Linux knowledge is fairly weak so maybe I am making a mistake there, or it could be directly related to the VM? Just looking for any advice or assistance on direction. Thank you for reading.
Solved! Go to Solution.
I found a solution :). I simply reinstalled the NVidia drivers and CUDA. Just CUDA didn't do it and NVidia drivers can be unresponsive to updating. I ended up going in and killing every NVidia process (with LSOF and kill by PID) to allow it to update from Google's provided command then reinstalling CUDA from the official site.
I did this because I found that /lib/x86_64-linux-gnu/libcuda.so.1 was throwing the error. If you aren't sure try GDB around your program.
Hope my pain helped you.
I found a solution :). I simply reinstalled the NVidia drivers and CUDA. Just CUDA didn't do it and NVidia drivers can be unresponsive to updating. I ended up going in and killing every NVidia process (with LSOF and kill by PID) to allow it to update from Google's provided command then reinstalling CUDA from the official site.
I did this because I found that /lib/x86_64-linux-gnu/libcuda.so.1 was throwing the error. If you aren't sure try GDB around your program.
Hope my pain helped you.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |