GCP Batch job: CODE_GCE_ERROR

We repetitively ran into a CODE_GCE_ERROR when executing a Batch job. There is no error log. The only status change showed the following information:

VM in Managed Instance Group meets error: Batch Error: code - CODE_GCE_ERROR, description - error count is 1, 
latest message example: Instance 'wf-lora-train-hopw-daac1c13-8095-4ada0-group0-0-zgr2' creation failed:
Internal error. Please try again or contact Google Support. (Code: '-5543464961495989270')

Is there anyway to address this issue? 

2 1 102
1 REPLY 1

Hello @r4ruixi,

If you're not receiving error logs in Cloud logging, make sure that you have the IAM roles that allows to write logs if you're using a custom service account. This is needed even if the Job template is set to stream logs to Cloud Logging

You should contact Google Cloud Support to further look into your case. Hope it helps, thanks!