Hello,
We're trying to set up a GCP batch processing with a custom instance template but we hit an issue.
The problem is that when the VM boots up, it stays idle for about 20 minutes, then it gets shut down. I described the VM with 'gcloud' and noticed it says
- description: Job state is set from QUEUED to SCHEDULED for job projects/12345/locations/us-central1/jobs/test-5. eventTime: '2023-03-01T03:25:36.379504559Z' type: STATUS_CHANGED
- description: no VM has agent reporting correctly within the time window 1080 seconds. VM state for instance j-4a6fec51-bd73-40c3-87cc-b67614f4d882-group0-0-2vgv is 2023/03/01-03:26:38+0000,startup,51,unsupported_cos. eventTime: '2023-03-01T03:46:35.801343563Z' type: OPERATIONAL_INFO
I confirmed the permissions are OK, the service account we use has the `Batch Agent Reporter` policy attached.
I checked for the networking issues, at least the Network tester says that we have packets flowing just as expected, we also have CloudNAT set up in the project.
I found a StackOverflow thread talking about a specific type of images that Batch is using when spinning up VMs for "script" jobs, but our job is expected to have access to:
1. Shared VPC
2. Specific services we run
We'd like to bundle up our code into a docker image, then create our own flavour of instance template and then run it as Batch.
The thread on SO, talks about `batch-cos-stable-official` but I cannot find a VM image even remotely close to that specific name.
Could you please advise how we should proceed with setting up our instance template or how to get access to batch-specific VM images?
Thanks in advance for your support
Regards,
Mikolaj M.