Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Basic container job failing with task with index 0 failed,

Job state is set from RUNNING to FAILED for job projects/XXXXXXXXXXXXXX/locations/us-central1/jobs/xxxxxx-orch-202309182321. Job failed due to task failures. For example, task with index 0 failed, failed task event description is Task state is updated from PENDING to FAILED on zones/us-central1-f/instances/5883206691509808443 with exit code 127.

I am just trying to run the below sample code but instance template:

https://cloud.google.com/batch/docs/samples/batch-create-container-job

Can you please help me understand what this error is about.

Thanks!

G

1 4 669
4 REPLIES 4

Hi @gkadam2011,

It seems your Batch job is using a custom image based on Red Hat Linux 8 OS. Unfortunately, Batch hasn't officially supported Red Hat Linux yet: https://cloud.google.com/batch/docs/vm-os-environment-overview#supported_vm_os_images.

For how to diagnose what the exact error that triggers the job failure, e.g. what does exit code 127 mean, you can refer to https://cloud.google.com/batch/docs/analyze-job-using-logs and https://cloud.google.com/batch/docs/samples/batch-job-logs to get more error related information.

For example, in your case, the job fails because Batch hasn't supported Red Hat, so Batch is trying to install docker with default "apt-get" command, which results the docker failure for the container job.

Out of curiosity, is the Red Hat the required OS you need to use,  or will a replacement OS such as Debian or CentOS work?

Thanks!

Thank you so much @wenyhu and my rain is about to burst identifying which command its not able to find but never thought of os support here.

It's good to know that Batch doest support Red Hat Linux yet. Certainly I can try with with OS.

Regards,

G

@wenyhu 

Thanks again for your response. Based on recommendation I switched my instance template to Debian and after switching to Debian I am still getting the same error but this time exit code 100. Have you ever seen this error and what could be it related to. There is no much information in the logs.

Error:

Job state is set from SCHEDULED to FAILED for job projects/XXXXXXXXXXXXXXX/locations/us-central1/jobs/XXXXXXXXX-orch-202309191306. Job failed due to task failures. For example, task with index 0 failed, failed task event description is Task state is updated from PENDING to FAILED on zones/us-central1-c/instances/8693805527343590318 with exit code 100.

Thanks!

G

Hi @gkadam2011,

It seems your Batch job with Debian 10 image is failed with error `Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?`.  Could you try to retry on the failed tasks with https://cloud.google.com/batch/docs/automate-task-retries to see whether it helps?

If not, could you start with using Batch Debian image to see whether it works: https://cloud.google.com/batch/docs/vm-os-environment-overview#supported_vm_os_images? 

Thanks!