Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Private endpoints number of replicas goes to zero

Hey,

We've noticed a weird behavior with our custom trained models deployed on private endpoints. In some cases the number of replicas goes to zero and it brakes our processing pipeline. We couldn't trace back the issue. In the logs it just shows that the workers are exiting.

After around 5 minutes, the replica was started again and continued working as it should.


I am attaching two screenshots from the logs and from cloud monitoring.

Any idea what would cause this issue ?

Thanks!

Screenshot from 2023-12-14 13-00-15.pngScreenshot from 2023-12-14 12-58-24.png

0 1 235
1 REPLY 1

Possibly a resource concern, I would recommend monitoring the job for more descriptive error. As workers exiting is still a general error.