Clarification required for Vertex AI Batch Predictions using the text-bison base model

I was finally able to make a batch prediction request for the base-model text-bison using APIs. It got a little confusing when trying with the SDK, so I tried with APIs. However, I still have some doubts on scaling and would like the answers from the community before I push to prod. 

  1. I increased the quota of the batch request to 10,000. So I believe I can make 10,000 prediction requests at once in a single batch request? 
  2. In this link - vertex-ai/docs/predictions/get-batch-prediction , for the BATCH_SIZE param, they mentioned issues with request timeouts. So when I tried to increase the MACHINE_TYPE, I got an error "base-model machine cannot be changed". So can the base-model handle the 10k batch request size every time consistently?
  3. The rate limits for base-model batch predictions is uncertain. In the Vertex AI docs - for a Job or LRO request, RPM is 60. In the Generative AI for Vertex AI docs, batch request quota is 4. what 4? RPM or concurrent jobs? In the above link, there's another note saying only 1 concurrent job for batch predictions as there are limited resources - 
    issue-3.PNG

Someone from the google team, please clarify as this is a production related query. 

1 0 131
0 REPLIES 0