Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

torch.cuda.is_available() returns False on Vertex AI

I run a custom model on vertex AI. Model is a simple FastAPI app that loads a whisper model. The beginning of the app looks like this.

 

 

 

if torch.cuda.is_available():
print("GPU is available =)")
model = whisper.load_model(model_name).cuda()
else:
print("GPU is not available =(")
model = whisper.load_model(model_name)

 

 

When running on vertex AI 

 

 

gcloud ai endpoints deploy-model [ENDPOINT_NAME] \
     --region=europe-west4 \
     --model=[MODEL_NAME] \
     --machine-type=n1-standard-2 \
     --accelerator=type=nvidia-tesla-t4,count=1 \

 

 


torch.cuda.is_available() always returns false.

There is also a log message prio to that

 

 

/app/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py:88: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)

 

 

Can you advice me a direction to look into. I'm running out of ideas how to set the app for the GPU support. 

This very same docker image works on Compute Engine Vm and can find nvidia drivers. Why can it not do it on Vertex AI.

Docker base image is this btw

 

 

FROM nvidia/cuda:11.7.0-base-ubuntu22.04

ENV PYTHON_VERSION=3.10
ENV POETRY_VENV=/app/.venv

RUN export DEBIAN_FRONTEND=noninteractive \
  && apt-get -qq update \
  && apt-get -qq install --no-install-recommends \
  python${PYTHON_VERSION} \
  python${PYTHON_VERSION}-venv \
  python3-pip \
  ffmpeg \
  && rm -rf /var/lib/apt/lists/*

RUN ln -s -f /usr/bin/python${PYTHON_VERSION} /usr/bin/python3 && \
  ln -s -f /usr/bin/python${PYTHON_VERSION} /usr/bin/python && \
  ln -s -f /usr/bin/pip3 /usr/bin/pip

RUN python3 -m venv $POETRY_VENV \
  && $POETRY_VENV/bin/pip install -U pip setuptools \
  && $POETRY_VENV/bin/pip install poetry

ENV PATH="${PATH}:${POETRY_VENV}/bin"

WORKDIR /app

COPY . /app

RUN poetry config virtualenvs.in-project true
RUN poetry install

RUN $POETRY_VENV/bin/pip install torch==1.13.0 -f https://download.pytorch.org/whl/torch

EXPOSE 8080
ENV PORT 8080

CMD exec gunicorn --bind :${PORT} --workers 1 --threads 8 --timeout 0 app.webservice:app -k uvicorn.workers.UvicornWorker

 

 

 

 



Solved Solved
1 2 4,492
1 ACCEPTED SOLUTION

I think it's best to use the official pytorch gpu image e.g. this:
https://hub.docker.com/layers/pytorch/pytorch/1.13.1-cuda11.6-cudnn8-runtime/images/sha256-1e26efd42...

Just make sure that you're not doing pip install pytorch again as that image already comes with Pytorch pre-installed with GPU set up, or you'll be overriding it and potentially disable GPUs.

 

View solution in original post

2 REPLIES 2

I think it's best to use the official pytorch gpu image e.g. this:
https://hub.docker.com/layers/pytorch/pytorch/1.13.1-cuda11.6-cudnn8-runtime/images/sha256-1e26efd42...

Just make sure that you're not doing pip install pytorch again as that image already comes with Pytorch pre-installed with GPU set up, or you'll be overriding it and potentially disable GPUs.

 

Thank. the issue is indeed that I used non gpu pytorch version.
here is the fix.

 

 

RUN $POETRY_VENV/bin/pip install torch==1.13.0+cu117 -f https://download.pytorch.org/whl/torch