Solved: torch.cuda.is_available() returns False on Vertex ...

Zall · 06-07-2023 02:06 PM

I run a custom model on vertex AI. Model is a simple FastAPI app that loads a whisper model. The beginning of the app looks like this.

if torch.cuda.is_available():
print("GPU is available =)")
model = whisper.load_model(model_name).cuda()
else:
print("GPU is not available =(")
model = whisper.load_model(model_name)

When running on vertex AI

gcloud ai endpoints deploy-model [ENDPOINT_NAME] \
     --region=europe-west4 \
     --model=[MODEL_NAME] \
     --machine-type=n1-standard-2 \
     --accelerator=type=nvidia-tesla-t4,count=1 \

torch.cuda.is_available() always returns false.

There is also a log message prio to that

/app/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py:88: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)

Can you advice me a direction to look into. I'm running out of ideas how to set the app for the GPU support.

This very same docker image works on Compute Engine Vm and can find nvidia drivers. Why can it not do it on Vertex AI.

Docker base image is this btw

FROM nvidia/cuda:11.7.0-base-ubuntu22.04

ENV PYTHON_VERSION=3.10
ENV POETRY_VENV=/app/.venv

RUN export DEBIAN_FRONTEND=noninteractive \
  && apt-get -qq update \
  && apt-get -qq install --no-install-recommends \
  python${PYTHON_VERSION} \
  python${PYTHON_VERSION}-venv \
  python3-pip \
  ffmpeg \
  && rm -rf /var/lib/apt/lists/*

RUN ln -s -f /usr/bin/python${PYTHON_VERSION} /usr/bin/python3 && \
  ln -s -f /usr/bin/python${PYTHON_VERSION} /usr/bin/python && \
  ln -s -f /usr/bin/pip3 /usr/bin/pip

RUN python3 -m venv $POETRY_VENV \
  && $POETRY_VENV/bin/pip install -U pip setuptools \
  && $POETRY_VENV/bin/pip install poetry

ENV PATH="${PATH}:${POETRY_VENV}/bin"

WORKDIR /app

COPY . /app

RUN poetry config virtualenvs.in-project true
RUN poetry install

RUN $POETRY_VENV/bin/pip install torch==1.13.0 -f https://download.pytorch.org/whl/torch

EXPOSE 8080
ENV PORT 8080

CMD exec gunicorn --bind :${PORT} --workers 1 --threads 8 --timeout 0 app.webservice:app -k uvicorn.workers.UvicornWorker

shengy90

I think it's best to use the official pytorch gpu image e.g. this:
https://hub.docker.com/layers/pytorch/pytorch/1.13.1-cuda11.6-cudnn8-runtime/images/sha256-1e26efd42...

Just make sure that you're not doing pip install pytorch again as that image already comes with Pytorch pre-installed with GPU set up, or you'll be overriding it and potentially disable GPUs.

View solution in original post

shengy90

I think it's best to use the official pytorch gpu image e.g. this:
https://hub.docker.com/layers/pytorch/pytorch/1.13.1-cuda11.6-cudnn8-runtime/images/sha256-1e26efd42...

Just make sure that you're not doing pip install pytorch again as that image already comes with Pytorch pre-installed with GPU set up, or you'll be overriding it and potentially disable GPUs.

Zall

Thank. the issue is indeed that I used non gpu pytorch version.
here is the fix.

RUN $POETRY_VENV/bin/pip install torch==1.13.0+cu117 -f https://download.pytorch.org/whl/torch

torch.cuda.is_available() returns False on Vertex AI