Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Triton on Vertex AI does not support multiple models?

Currently, I want to deploy a Triton server to Vertex AI endpoint. However I received this error message.

"failed to start Vertex AI service: Invalid argument - Expect the model repository contains only a single model if default model is not specified"

Is this mean that the Triton server deploy only support one model? It is different from what I have read in this document about concurrent model execution

https://cloud.google.com/vertex-ai/docs/predictions/using-nvidia-tritonScreen Shot 2022-08-25 at 21.13.18.png

2 10 2,017
10 REPLIES 10

The error message suggest that you haven't selected a default model.

Hi, I have the same issue and I couldn't find how to set a default model. Could you please link a guide about it or explain how to do that? Thanks

Hi @Eduardo_Ortiz

Can you provide documentation on how we can set the default model for a triton ensemble?

I did not see any references to this in these Vertex AI docs, and it doesn't seem like "default model" is an Nvidia Triton concept??
 

Looks like we can set the default model for vertex ai via the `vertex-ai-default-model` flag (source code).

I.e., 

tritonserver --model-repository $MODEL_REPO --vertex-ai-default-model={DEFAULT_MODEL}

Cool ! so i understand that you can only use one model at a time.

For information i was able to run one model but the way we query the vertex ai endpoint doesn't allow us to choose a specific model. So i guess that using Triton with multiple model is not supported for now ?

As specified in the documentation, ensure that you provide the flag 

--container-args='--strict-model-config=false'

While importing it into model registry as follows:

gcloud ai models upload \
 
--region=LOCATION \
 
--display-name=DEPLOYED_MODEL_NAME \
 
--container-image-uri=LOCATION-docker.pkg.dev/PROJECT_ID/getting-started-nvidia-triton/vertex-triton-inference \
 
--artifact-uri=MODEL_ARTIFACTS_REPOSITORY \
 
--container-args='--strict-model-config=false'

Still not possible to use a ensemble model ? it doesn't work for now

It's mainly an issue of shared memory size not customizable when running a vertex ai online predictions. Have you been able to customize the "shm-size" parameters ?

There is an open ticket VertexAI does not allocate enough shared memory to run Triton containers [278045294] - Visible to Pu...

No I have not. Not ideal at all.

To work around, I shrank shared memory usage via  `backend-config` flag. I.e. --backend-config=python,shm-default-byte-size=15728640

Again, not ideal, especially given default shm-size is quite small