We deployed a custom container to vertex ai , it has prebuilt torch GPU container us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-13.py310:latest as base image but when we deploy it to vertex using n1-highmem-8 machine and tesla t4 gpu , container is not able to access GPU , device is still CPU . Please guide .
Solved! Go to Solution.
Hi pulkitmehtawork,
Welcome to Google Cloud Community!
Here in Custom containers overview there are several requirements needed in order for us to run your training application.
While us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-13.py310:latest is a prebuilt PyTorch GPU container, verifying its base image compatibility with Vertex AI's environment and resource allocation mechanisms. Consider checking for any specific instructions or compatibility notes here on how to configure GPUs in custom containers.
For an alternative solution, according to user @/gogasca, you may also try Google Cloud Deep Learning containers from there you can use the PyTorch version that might help with your issue. For more information on how this works, you may refer to this conversation.
Hope this helps.