Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Scheduling a Google Batch job to trigger on GCP (long running Python script)

By "Google Batch" I'm referring to the new service Google launched about a month or so ago.

https://cloud.google.com/batch

I have a Python script which takes a few minutes to execute at the moment. However with the data it will soon be processing in the next few months this execution time will go from minutes to hours. This is why I am not using Cloud Function or Cloud Run to run this script, both of these have a max 60 minute execution time.

Google Batch came about recently and I wanted to explore this as a possible method to achieve what I'm looking for without just using Compute Engine.

However documentation is sparse across the internet and I can't find a method to "trigger" an already created Batch job by using Cloud Scheduler. I've already successfully manually created a batch job which runs my docker image. Now I need something to trigger this batch job 1x a day, thats it. It would be wonderful if Cloud Scheduler could serve this purpose.

I've seen 1 article describing using GCP Workflow to create a a new Batch job on a cron determined by Cloud Scheduler. Issue with this is its creating a new batch job every time, not simply re-running the already existing one. To be honest I can't even re-run an already executed batch job on the GCP website itself so I don't know if its even possible.

https://www.intertec.io/resource/python-script-on-gcp-batch

Lastly, I've even explored the official Google Batch Python library and could not find anywhere in there some built in function which allows me to "call" a previously created batch job and just re-run it.

https://github.com/googleapis/python-batch

0 5 4,162
5 REPLIES 5

Hello,

Today, Batch does not support the ability to re-run an already submitted job due to the need for the job name to be unique. We appreciate you voicing this use case, as it is one we are looking to address soon by using just Cloud Scheduler to trigger the Batch job directly. In the meantime, to automatically schedule a job for a certain time, you would need to have Cloud Scheduler trigger a Cloud Workflow that creates a new Batch job based on the existing job definition. Can you share why creating a new job with same job definition does not meet your needs?

We will gradually be adding new documentation and will track this use case as one of the items to share.

Thanks @Shamel 
Cloud Scheduler triggering a Workflow to create a Batch job ended up being the route I decided to take. 

However Cloud Scheduler triggering a batch job directly would be AMAZING. I could bypass Workflows entirely once that gets implemented. 

"Can you share why creating a new job with same job definition does not meet your needs?"
To answer your question, I didn't want to create a new batch job every time I needed to run my Python code, that's all. 

Now with Cloud Scheduler triggering Workflows, I have something very important I would love to clarify with you. 

The flow I'm going for my use case at the moment goes like this: 

Cloud Scheduler -> Workflows -> Batch job -> Docker Image Executed (Python Code) - > End

After Cloud Scheduler triggers a Workflow, is that immediately considered a Success for Cloud Scheduler? Is it also an HTTP "GET" or "POST" to Workflows from Cloud Scheduler so Workflows gets triggered? If it is either of those 2 HTTP methods, how fast does Workflow give a "response" back to Cloud Scheduler? 

The reason thats important is because I know with using Cloud Functions or Cloud Run in the past, Cloud Scheduler can wait 30 minutes max (attempt deadline property) for a response back from an HTTP call, if it doesn't hear anything by the 30 minute mark then Cloud Scheduler will deem that run a Failure. This would happen if a Cloud Function takes longer than 30 minutes to run all its code for instance. 

I want to be 100% sure Cloud Scheduler isn't waiting for lets say my docker image to finishing running (could take multiple hours).

If Cloud Scheduler only cares about hearing from Workflows directly then we are good. 

@Shamel  just for reference. I set the Cloud Scheduler job through the website UI when I am creating the Workflow itself. (pics attached) I see no indication on there how this Workflow will be invoked from Cloud Scheduler.

However if I go to this documentation: 

https://cloud.google.com/scheduler/docs/tut-workflows

I see that if I create the Cloud Scheduler job on the Cloud Scheduler page I have to indicate that it's a HTTP POST to the Workflow. 

Is this abstracted away when I set the scheduler through the workflow? 

I would greatly appreciate if you could provide some insight onto all this. Thank you! 

image.png

--------------------------------------------------------------------------------------------------------------

image.png

Once Cloud Scheduler triggers the Workflow successfully, this should be considered "success" by Cloud Scheduler and seems to be returned fairly quick based on my tests.

Scheduling a Cloud Workflow within Cloud Scheduler or directly in Workflows will have the same behavior from a scheduling standpoint. Although I'm not 100% sure, it seems as though specifying the HTTP method is abstracted away and may default to POST.

Thank you. 

I did some tests and the behavior does seem to be consistent for both methods, creating a Cloud Scheduler job within the Workflow or creating the Cloud Scheduler job within Cloud Scheduler.