Hi, I'm using Python with google-cloud-aiplatform==1.39.0.
I am attempting to use TextGenerationModel.batch_predict() with text-bison@002 on JSONL files in GCS. I've put 25k lines in each file and each file is sized anywhere from ~27-40MB.
Unfortunately, I tried to run this on just one file and it took about 7 hours to complete. There doesn't seem to be a way to tweak the Vertex AI platform to improve performance.
I don't fully understand what is happening under the hood so I then tried sending a list of files instead (using the dataset=[] parameter) with portions of the 25k and that actually took longer (~9 hours).
Is this performance to be expected? Is there anything I can do? Considering I have millions of records to process (and would like to run iterative experiments on), this length of time is immediately prohibitive to my use of the product.
The only thing I hadn't tried yet was to use the non-Vertex AI tools to do this work since it seemed like you could tune the machine types.
Any suggestions?
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |