Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to manually scale nodes for a deployed Vertex AI endpoint

I have a deployed endpoint on Vertex AI with auto-scaling being enabled. But I want to manually adjust the min-replicas and max-replicas for the deployed endpoint. How to do so?

Solved Solved
1 2 1,107
1 ACCEPTED SOLUTION

I believe you can set the min and max replication count for deployment, Please see gcloud flags that you will need when when modifying these parameters.

gcloud ai endpoints deploy-model (ENDPOINT : --region=REGION) --display-name=DISPLAY_NAME --model=MODEL [--accelerator=[count=COUNT],[type=TYPE]] [--autoscaling-metric-specs=[METRIC-NAME=TARGET,…]] [--deployed-model-id=DEPLOYED_MODEL_ID] [--disable-container-logging] [--enable-access-logging] [--machine-type=MACHINE_TYPE] [--max-replica-count=MAX_REPLICA_COUNT] [--min-replica-count=MIN_REPLICA_COUNT] [--service-account=SERVICE_ACCOUNT] [--traffic-split=[DEPLOYED_MODEL_ID=VALUE,…]]

Reference: https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/deploy-model 

View solution in original post

2 REPLIES 2