Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to manually scale nodes for a deployed Vertex AI endpoint

I have a deployed endpoint on Vertex AI with auto-scaling being enabled. But I want to manually adjust the min-replicas and max-replicas for the deployed endpoint. How to do so?

Solved Solved
1 2 1,067
1 ACCEPTED SOLUTION

I believe you can set the min and max replication count for deployment, Please see gcloud flags that you will need when when modifying these parameters.

gcloud ai endpoints deploy-model (ENDPOINT : --region=REGION) --display-name=DISPLAY_NAME --model=MODEL [--accelerator=[count=COUNT],[type=TYPE]] [--autoscaling-metric-specs=[METRIC-NAME=TARGET,…]] [--deployed-model-id=DEPLOYED_MODEL_ID] [--disable-container-logging] [--enable-access-logging] [--machine-type=MACHINE_TYPE] [--max-replica-count=MAX_REPLICA_COUNT] [--min-replica-count=MIN_REPLICA_COUNT] [--service-account=SERVICE_ACCOUNT] [--traffic-split=[DEPLOYED_MODEL_ID=VALUE,…]]

Reference: https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/deploy-model 

View solution in original post

2 REPLIES 2

I believe you can set the min and max replication count for deployment, Please see gcloud flags that you will need when when modifying these parameters.

gcloud ai endpoints deploy-model (ENDPOINT : --region=REGION) --display-name=DISPLAY_NAME --model=MODEL [--accelerator=[count=COUNT],[type=TYPE]] [--autoscaling-metric-specs=[METRIC-NAME=TARGET,…]] [--deployed-model-id=DEPLOYED_MODEL_ID] [--disable-container-logging] [--enable-access-logging] [--machine-type=MACHINE_TYPE] [--max-replica-count=MAX_REPLICA_COUNT] [--min-replica-count=MIN_REPLICA_COUNT] [--service-account=SERVICE_ACCOUNT] [--traffic-split=[DEPLOYED_MODEL_ID=VALUE,…]]

Reference: https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/deploy-model 

The requirement was I wanted to manually increase min & max-replicas of a already deployed model in vertex ai. I got this api 

Method: projects.locations.endpoints.mutateDeployedModel - But this api is only allowing me to scale nodes till a limit of max-replicas of 50, not after that.

Reference: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.endpoints/mutateDeploye...