how to do parallelization of prompting text.

Say I have a dataframe with 20000 Spanish transcripts . I want to convert to English and then summarize the text. If I run this on workbench it takes more than 5 hrs. How can we minimize the time taken to run this prompt.  i have 4vCPUs workbench notebook, so can we  do the prompting parallelly

0 3 103
3 REPLIES 3

It is suggested to test on multiple machine type to compare the results between machine types : 

One way to do this is to run this notebook on multiple machine types and compare the results to find the one that works best for you."

As recommended in this documentation: https://cloud.google.com/vertex-ai/docs/predictions/configure-compute

Adding a gpu is generally good but it will cause you more so it is better to test these machine type first.

The bottle neck may not be the notebook instance. You could check the CPU utilization to confirm.

For parallel, did you try to send the request in async mode? Note the Quota Limit QPM for the model. You can check in the Quota page for your current quota. 

I will check with the async mode and quota page