Hi,
I have a Workflow that creates a batch job task that runs up a docker image to do image processing then exits. I use VM Spot instances. All is working fine.
Now I would like to implement and test a simple retry mechanism if the VM is pre-empted and warn DevOps if the task fails.
I need an example of the Workflow Task retry syntax, the ability to manually pre-empt the VM (when using the gcloud compute instances stop command I need to have the VM ID), and a way to query the exit code in a subsequent Workflow step.
Workflow snippet (deploys but not tested):
- create_transcoding_job:
If retries exhausted, I'd like to notify DevOps
Solved! Go to Solution.
Hello,
1. The retry syntax looks good to me, 50001 is the right code for preemption.
2. To test preemption, you can simulate maintenance event for VMs. What I did previously is
gcloud compute instances set-scheduling VM_NAME --maintenance-policy=TERMINATE --zone ZONE
gcloud compute instances simulate-maintenance-event VM_NAME --zone ZONE
Btw, you can find your VM names in your Cloud Console, they should be prefixed with the same job id.
3. One thing to mention is that you can not query exit code now. Exit code is only available in the task event description, so for now when a task is failed, you have to parse the description message. For example, description like "task is failed due to Spot preemption with code 50001", you need to parse the code from the message.