Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Mulitple Batch prediction jobs using gemini-1.5

Hello Team,

While I was testing batch prediction using big query as data source. I want to run multiple batch prediction jobs at the same time. These jobs will use two different tables from big query. How can I achieve that? I am not able to find any resource around that.

Thank you
/Akshita

0 2 362
2 REPLIES 2

Hi @akshita_horizon,

Welcome to Google Cloud Community!

It looks like you are trying to run multiple batch prediction jobs concurrently on different BigQuery tables, but you’re having trouble finding the necessary resources or guidance.

To address your question, here are potential ways that might help with your use case:

  • Utilizing the Python AI Platform SDK: You may trigger batch prediction jobs with aiplatform.BatchPredictionJob.create(), which run asynchronously to launch jobs in parallel without blocking the script and allow you to configure machine resources (e.g., machine_type, replica_count) to scale according to the size and complexity of your data.
  • Using Gcloud CLI: You may use the gcloud ai batch-predict create command to start each job. You may append ‘&’ to execute the jobs concurrently in the background and use the ‘wait’ command to ensure the script waits for all jobs to complete before exiting.

You may refer to the following documentation, which might help you understand how to implement multiple batch prediction jobs simultaneously using BigQuery as a data source, specifically executing jobs on different tables concurrently:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

 

Hi @MarvinLlamas

Thanks a lot for helping out here, I have tried the custom jobs but they require the hosted model, which is not the case in my problem.
I'm submitting batch prediction jobs in big query like this :
```

batch_prediction_job = BatchPredictionJob.submit(
source_model="gemini-1.5-flash-002",
input_dataset=input_table_url,
output_uri_prefix=output_uri,

 and preparing the job with data :

job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.schema = table_schema
client.load_table_from_json(data, table_id, job_config=job_config)

 If I submit the two jobs in parallel, the issue I'm facing is that one job will be RUNNING and the other will be PENDING until the first one completes its execution. 
So, is there any way the n- n-number of pending jobs should also run in parallel and not take double time to finish the jobs in the queue?