process betaMerge {
label 'python'
label 'biggermem'
maxRetries 20
errorStrategy {if (task.attempt == 20 || task.exitStatus == 1 || task.exitStatus == 137) { s = 'ignore' } else {println "Retrying ${task.name} with more disk size"; s = "retry"}; s}
disk {
base = (8.B * csvs*.size().sum())
mult = task.attempt > 3 ? 3 : task.attempt
us = mult * base
println "Disk used by ${task.name}: ${us}"
us}
input:
path csvs, arity:'1..*', stageAs: "*/*" //comma separated
val csv_names
path samplesheet //tab separated
val sample_group
val output_fname // extension should be included
val large_mem // 1
output:
path "*.*", includeInputs:false
publishDir "$OUTPUT_ROOT/$sample_group/merged"
shell:
'''
output_fname=!{output_fname}
sample_group=!{sample_group}
large_mem=!{large_mem}
csvs="!{csvs}"
csv_names="!{csv_names}"
csvs_array=( $csvs )
csv_names_array=( $csv_names )
mkdir -p input_csvs
# we need this as the connection with the machine is not to be trusted
cd input_csvs &&
for i in "${!csvs_array[@]}"; do
csv="${csvs_array[i]}" &&
fname=`basename ${csv}` &&
csv_name="${csv_names_array[i]}" &&
echo "Copying $csv to input_csvs/${csv_name}.$fname .." && {
ln -sf ../$csv ${csv_name}.${fname} || cp ../$csv input_csvs/${csv_name}.${fname}
}
done
cd .. &&
args=
if [[ $large_mem -eq 1 ]]
then
args="$args -l -u 50000"
fi
samplesheet="!{samplesheet}"
if [[ $samplesheet == "undefined" ]]
then
beta_merge.py -i input_csvs -o $output_fname $args
else
beta_merge.py -i input_csvs -o $output_fname -s "!{samplesheet}" $args
fi
'''
}
Hi all. The Nextflow process above is supposed to merge multiple CSV beta files into one large file. However it keeps throwing the VM reporting timeout error 50002 . The suggested solution in Google Troubleshoot page ("To resolve this issue, retry the task either by using automated task retries or manually re-running the job.") makes no sense, as I have tried to run the process almost 30 times, getting the exact same error. Has anyone of you observed this problem and how did you mitigate it? Huge thanks in advance!
Solved! Go to Solution.
Hi @vaslem agreed, I did not see any error exposed to our end when the job failed. It seems the vm just crashed. I involved gcsfuse team to gain more insights internally.
At the same time, we will update our document to at least offer hints that using larger machines could potentially bypass the issue first.
=========
For posterity, @vaslem managed to circumvent the issue by splitting the operation even more into multiple machines, by exposing operations done within the python file to Nextflow processes.