Slow Vertex AI Batch Prediction via TextGenerationModel

vbt
Bronze 1
Bronze 1

Hi, I'm using Python with google-cloud-aiplatform==1.39.0.

I am attempting to use TextGenerationModel.batch_predict() with text-bison@002 on JSONL files in GCS. I've put 25k lines in each file and each file is sized anywhere from ~27-40MB.

Unfortunately, I tried to run this on just one file and it took about 7 hours to complete. There doesn't seem to be a way to tweak the Vertex AI platform to improve performance.

I don't fully understand what is happening under the hood so I then tried sending a list of files instead (using the dataset=[] parameter) with portions of the 25k and that actually took longer (~9 hours).

Is this performance to be expected? Is there anything I can do? Considering I have millions of records to process (and would like to run iterative experiments on), this length of time is immediately prohibitive to my use of the product.

The only thing I hadn't tried yet was to use the non-Vertex AI tools to do this work since it seemed like you could tune the machine types.

Any suggestions?

1 1 353
1 REPLY 1

vbt
Bronze 1
Bronze 1

Just an update to this:

I do understand from https://cloud.google.com/vertex-ai/docs/predictions/configure-compute and from some other sparse references that for classification not done with tabular data in AutoML or in custom models, you can't specify a machine type as Vertex AI handles it for you.

Also, each of these files I'm processing now takes 1.5 hours as of the last day or so. I didn't change anything on my end at all so I still don't have any idea why it goes faster/slower or whether I can improve the processing time further to what I would probably need (a few minutes per file).