I have a deployed endpoint on Vertex AI with auto-scaling being enabled. But I want to manually adjust the min-replicas and max-replicas for the deployed endpoint. How to do so?
Solved! Go to Solution.
I believe you can set the min and max replication count for deployment, Please see gcloud flags that you will need when when modifying these parameters.
gcloud ai endpoints deploy-model (ENDPOINT : --region=REGION) --display-name=DISPLAY_NAME --model=MODEL [--accelerator=[count=COUNT],[type=TYPE]] [--autoscaling-metric-specs=[METRIC-NAME=TARGET,…]] [--deployed-model-id=DEPLOYED_MODEL_ID] [--disable-container-logging] [--enable-access-logging] [--machine-type=MACHINE_TYPE] [--max-replica-count=MAX_REPLICA_COUNT] [--min-replica-count=MIN_REPLICA_COUNT] [--service-account=SERVICE_ACCOUNT] [--traffic-split=[DEPLOYED_MODEL_ID=VALUE,…]]
Reference: https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/deploy-model
I believe you can set the min and max replication count for deployment, Please see gcloud flags that you will need when when modifying these parameters.
gcloud ai endpoints deploy-model (ENDPOINT : --region=REGION) --display-name=DISPLAY_NAME --model=MODEL [--accelerator=[count=COUNT],[type=TYPE]] [--autoscaling-metric-specs=[METRIC-NAME=TARGET,…]] [--deployed-model-id=DEPLOYED_MODEL_ID] [--disable-container-logging] [--enable-access-logging] [--machine-type=MACHINE_TYPE] [--max-replica-count=MAX_REPLICA_COUNT] [--min-replica-count=MIN_REPLICA_COUNT] [--service-account=SERVICE_ACCOUNT] [--traffic-split=[DEPLOYED_MODEL_ID=VALUE,…]]
Reference: https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/deploy-model
The requirement was I wanted to manually increase min & max-replicas of a already deployed model in vertex ai. I got this api
Method: projects.locations.endpoints.mutateDeployedModel - But this api is only allowing me to scale nodes till a limit of max-replicas of 50, not after that.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |