Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

High Latency Issue with Vertex AI cURL Requests

Hi,

I'm making a cURL request to VertexAI, using the chat-bison-32k model. The context message I'm sending is quite large, around 44,000 characters, but the response time for a single request is consistently between 30 to 35 seconds. How can I optimize this to achieve a response time of 4 to 5 seconds?
The use case is to generate SQL query for a question asked to the model.

1 REPLY 1

I am experiencing a latency of 2 minutes. I haven't been able to capture the full output but input tokens is ~36000 and output is approx 5000 tokens.