Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

'/PredictionService.Predict' from role 'cloud-llm-api-servo-model-owner' throttled

Hey folks,

I've been getting this weird message sporadically when using Cloud Text-To-Speech Long Audio Synthesis functionality that indicate system capacity limits.  This function has an exponential backoff and is still failing even after 7 attempts

The error message I get is:

 

 

400 Request '/PredictionService.Predict' from role 'cloud-llm-api-servo-model-owner' throttled: Service is overloaded (in-flight-requests) go/tr-o. 3: Request '/PredictionService.Predict' from role 'cloud-llm-api-servo-model-owner' throttled: Service is overloaded (in-flight-requests) go/tr-o."

 

 

I'm wondering if instead this is hitting quotas on my own account instead.  I've looked at the usage graph on my API usage and it doesn't seem like it is even throwing 400s from the backend during the time we had issues. But perhaps that's expected for this metric?

Screenshot 2025-02-25 at 11.03.36 AM.png

Wondering if anyone has any thoughts as to what's going on here.  

Solved Solved
0 4 354
1 ACCEPTED SOLUTION

Figured this out on my own, the issue was that I was using some of the old Journey voices, which, Google subtly has provided less capacity for.  In turn it was taking VERY long to run or it was just failing like the above.  Switching to the new Chirp3 HD models helped.

View solution in original post

4 REPLIES 4
Top Labels in this Space