Triton on Vertex AI does not support multiple mode...

ntranphong · 08-25-2022 07:15 AM

Currently, I want to deploy a Triton server to Vertex AI endpoint. However I received this error message.

"failed to start Vertex AI service: Invalid argument - Expect the model repository contains only a single model if default model is not specified"

Is this mean that the Triton server deploy only support one model? It is different from what I have read in this document about concurrent model execution

https://cloud.google.com/vertex-ai/docs/predictions/using-nvidia-triton Screen Shot 2022-08-25 at 21.13.18.png

Eduardo_Ortiz

The error message suggest that you haven't selected a default model.

paoloqaz

Hi, I have the same issue and I couldn't find how to set a default model. Could you please link a guide about it or explain how to do that? Thanks

SalsaPicante

Hi @Eduardo_Ortiz,

Can you provide documentation on how we can set the default model for a triton ensemble?

I did not see any references to this in these Vertex AI docs, and it doesn't seem like "default model" is an Nvidia Triton concept??

SalsaPicante

Looks like we can set the default model for vertex ai via the `vertex-ai-default-model` flag (source code).

I.e.,

tritonserver --model-repository $MODEL_REPO --vertex-ai-default-model={DEFAULT_MODEL}

mbtki

Cool ! so i understand that you can only use one model at a time.

For information i was able to run one model but the way we query the vertex ai endpoint doesn't allow us to choose a specific model. So i guess that using Triton with multiple model is not supported for now ?

mohammedtameem_

As specified in the documentation, ensure that you provide the flag

--container-args='--strict-model-config=false'

While importing it into model registry as follows:

gcloud ai models upload \
  --region=LOCATION \
  --display-name=DEPLOYED_MODEL_NAME \
  --container-image-uri=LOCATION-docker.pkg.dev/PROJECT_ID/getting-started-nvidia-triton/vertex-triton-inference \
  --artifact-uri=MODEL_ARTIFACTS_REPOSITORY \
  --container-args='--strict-model-config=false'

mbtki

Still not possible to use a ensemble model ? it doesn't work for now

SalsaPicante

I was able to set up an ensemble model.

See my comment here: https://www.googlecloudcommunity.com/gc/AI-ML/Triton-on-Vertex-AI-does-not-support-multiple-models/m...

mbtki

It's mainly an issue of shared memory size not customizable when running a vertex ai online predictions. Have you been able to customize the "shm-size" parameters ?

There is an open ticket VertexAI does not allocate enough shared memory to run Triton containers [278045294] - Visible to Pu...

SalsaPicante

No I have not. Not ideal at all.

To work around, I shrank shared memory usage via `backend-config` flag. I.e. --backend-config=python,shm-default-byte-size=15728640

Again, not ideal, especially given default shm-size is quite small

Triton on Vertex AI does not support multiple models?