I am running a nextflow pipeline on google cloud; the pipeline relies on a docker image, it works well locally and has worked well on google-lifesciences in the past. We have migrated it to google-batch and it gets stuck about halfway, in a process.
Where it gets stuck, the VM first needs to mount two buckets. I can see the two buckets from within "/mnt/disks" in the VM. I can "ls" into one but not into the other, the command gets stuck.
Checking the node activity with "top", I can see that almost nothing is happening. Every now and then, a "containerd" and a "gcsfuse" process pops up for just a split second.
I have no way to verify what's going on. The VM is not dying, the process is not starting, and gcsfuse seems to run forever without actually managing to mount the second bucket.
Where should I start looking? Can anyone suggest any debugging strategy?
Hi @schmat_90 , I tried a job with two buckets and docker containers. It worked for me, here is my job configuration:
{
"task_groups":[
{
"task_spec":{
"runnables": [
{
"container": {
"imageUri": "bash",
"commands": [
"-c",
"sleep 1000"
]
}
}
],
"volumes": [
{
"gcs": {
"remote_path": "${MyBucket1}"
},
"mount_path": "/mnt/disks/gcs1"
},
{
"gcs": {
"remote_path": "${MyBucket2}"
},
"mount_path": "/mnt/disks/gcs2"
}
]
},
"task_count":2,
}
]
}
Could you please share more about the job spec you use, especially on the volume configuration?
Thanks!