Hi,
I'm making a cURL request to VertexAI, using the chat-bison-32k model. The context message I'm sending is quite large, around 44,000 characters, but the response time for a single request is consistently between 30 to 35 seconds. How can I optimize this to achieve a response time of 4 to 5 seconds?
The use case is to generate SQL query for a question asked to the model.