High Latency Issue with Vertex AI cURL Requests

arpit_verma1 · 01-23-2024 02:13 AM

Hi,

I'm making a cURL request to VertexAI, using the chat-bison-32k model. The context message I'm sending is quite large, around 44,000 characters, but the response time for a single request is consistently between 30 to 35 seconds. How can I optimize this to achieve a response time of 4 to 5 seconds?
The use case is to generate SQL query for a question asked to the model.

sathish297

I am experiencing a latency of 2 minutes. I haven't been able to capture the full output but input tokens is ~36000 and output is approx 5000 tokens.