Hello Google Cloud Community!
I’m currently working on a project that involves managing and monitoring Cloud Batch jobs on Google Cloud. In AWS, there's a RUNNABLE state that is triggered when a job is interrupted and retried. The RUNNABLE state is useful for tracking job attempts and calculating the total time spent on each attempt.
I am trying to implement a similar logic in Google Cloud to handle job retries, interruptions, and to track the duration of each attempt for Cloud Batch jobs. In AWS, when a job transitions to RUNNABLE, the job is essentially being retried, and we increment a counter for the number of attempts. We also calculate the total duration spent across retries.
Is there a way to track retries or "RUNNABLE" states for jobs in Google Cloud Batch? Or Is there any way to get the retry Count and its time duration.
Hi @harshada2828,
For retries, you can refer to https://cloud.google.com/batch/docs/automate-task-retries, for run durations, you can refer to https://cloud.google.com/batch/docs/set-timeouts. And yes we do have similar information exposure, you can find more fields information from https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#jobstatus and https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs.taskGroups.tasks#tasks....
Thanks,
Wenyan