Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

What is the default timeout when a predict request is made to an LLM model deployed on vertex AI

I am using this library to make a prediction request to the model deployed on Vertex AI. I am getting a timeout exception, Not sure if I need to increase the timeout and up to what value . Also what is the default value , I can find nothing in the documentation 

Client API  : https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services...

Solved Solved
1 2 2,532
1 ACCEPTED SOLUTION

Hello,

As quoted in the documentation



“Requests timeout after 60 seconds for both public and private endpoints.”

If you would like to request a timeout more than 60 seconds, you must file a support ticket or contact your Google Cloud representative.

I hope this helps.

View solution in original post

2 REPLIES 2

Hello,

As quoted in the documentation



“Requests timeout after 60 seconds for both public and private endpoints.”

If you would like to request a timeout more than 60 seconds, you must file a support ticket or contact your Google Cloud representative.

I hope this helps.

In my case, I was deploying the endpoint as a shared public endpoint, and according to the documentation , shared public endpoints have a timeout of 60 seconds. So, if you need a timeout longer than 60 seconds, you need to deploy it as a dedicated endpoint.