Re: Does Google Cloud CLI retry? How to handle tra...

AlexPetty · 09-04-2024 10:54 AM

An analysis pipeline that I work with depends on gcloud CLI in order to pipe data into a tool that expect standard input, but I'm struggling with handling what seem to be network failures between the VM in Google Cloud and Google Cloud Storage.

Specifically, I am seeing regular, consistent failures at varying points during the operation that say
"(gcloud.storage.cat) Download not completed. Target size=518107929774, downloaded data=99967842155", with varying target sizes and downloaded data numbers based on the size of the file and how far the download made it, of course.

I've read the API documentation at https://cloud.google.com/sdk/gcloud/reference/storage , but have not seen any information at all about retries. Am I missing something? Is there something else I'm missing causing failures of a VM running in Google Cloud downloading data from GCS?

jaydubu

Hi @AlexPetty,

Welcome to Google Cloud Community!

The error message "Download not completed. Target size=xxx, downloaded data=xxx" indicates that the download of data from Google Cloud Storage (GCS) to your Virtual Machine (VM) was interrupted or failed prematurely, likely due to a transient network issue.

You can try below troubleshooting steps and recommendations that could help resolve this behavior:

Troubleshooting:

gcloud storage cp --verbosity=debug: add debug command for a detailed output of what’s happening in the background.
VM instance storage: ensure that the destination VM has sufficient disk space to store the data from GCS.

Recommendations:

gcloud config set storage/max_retries xxxx: try increasing the number of max retries with this command. The default value is “23”, so I would advise setting it to some bigger value (e.g 500).
Explore --daisy-chain, -D: something to try, but not recommended for large transfer/download.

Use two separate commands: a lot faster than daisy-chain as you'll be able to take advantage of multiprocessing and multithreading.

# download from GCS to local
gcloud storage cp gs://<source> /<local_path>

# upload from local to destination
gcloud storage cp /<local_path> <destination>

I hope the above information is helpful.

AlexPetty

Thanks, this is helpful info.

I cannot find any documentation on storage/max_retries. Is this documented anywhere? Also, is there another approach I should be using for large files other than gcloud storage cp and cat? We have tools that do filtering / data processing from std input and would prefer to stream the object from cloud storage rather than copying it to a local filesystem.

jaydubu

Here’s the documentation for storage/max_retries. Also, you can set storage/multipart_chunksize to upload larger file sizes by splitting them into smaller chunks during the upload process – the size should not be larger than the file. Here's the full list of gcloud config set parameters and utilize any of it that fits your use case.

While there are products like Storage Transfer Service that help move large amounts of data quickly, I think gcloud storage cat and cp is the recommended approach for your environment. Check out the comparison between these two transfer options based on specific scenarios.

To summarize above information:

Setting the storage/max_retries to bigger value, can be done using
gcloud config set storage/max_retries 500
Updating the multipart/chunk_size to 1G – if the file is 100G, this should yield 100 parts of size 1G. Can be done using gcloud config set storage/multipart_chunksize 1G
- If you receive any error, try setting it to 100MB or 104857599
gcloud config set storage/multipart_chunksize 104857599
Explore Storage Transfer Service as an alternative.

Does Google Cloud CLI retry? How to handle transient network failures for storage requests?