An analysis pipeline that I work with depends on gcloud CLI in order to pipe data into a tool that expect standard input, but I'm struggling with handling what seem to be network failures between the VM in Google Cloud and Google Cloud Storage.
Specifically, I am seeing regular, consistent failures at varying points during the operation that say
"(gcloud.storage.cat) Download not completed. Target size=518107929774, downloaded data=99967842155", with varying target sizes and downloaded data numbers based on the size of the file and how far the download made it, of course.
I've read the API documentation at https://cloud.google.com/sdk/gcloud/reference/storage , but have not seen any information at all about retries. Am I missing something? Is there something else I'm missing causing failures of a VM running in Google Cloud downloading data from GCS?
Hi @AlexPetty,
Welcome to Google Cloud Community!
The error message "Download not completed. Target size=xxx, downloaded data=xxx" indicates that the download of data from Google Cloud Storage (GCS) to your Virtual Machine (VM) was interrupted or failed prematurely, likely due to a transient network issue.
You can try below troubleshooting steps and recommendations that could help resolve this behavior:
Troubleshooting:
Recommendations:
gcloud config set storage/max_retries xxxx
: try increasing the number of max retries with this command. The default value is “23”, so I would advise setting it to some bigger value (e.g 500).# download from GCS to local
gcloud storage cp gs://<source> /<local_path>
# upload from local to destination
gcloud storage cp /<local_path> <destination>
I hope the above information is helpful.
Thanks, this is helpful info.
I cannot find any documentation on storage/max_retries. Is this documented anywhere? Also, is there another approach I should be using for large files other than gcloud storage cp and cat? We have tools that do filtering / data processing from std input and would prefer to stream the object from cloud storage rather than copying it to a local filesystem.
Here’s the documentation for storage/max_retries. Also, you can set storage/multipart_chunksize to upload larger file sizes by splitting them into smaller chunks during the upload process – the size should not be larger than the file. Here's the full list of gcloud config set parameters and utilize any of it that fits your use case.
While there are products like Storage Transfer Service that help move large amounts of data quickly, I think gcloud storage cat
and cp
is the recommended approach for your environment. Check out the comparison between these two transfer options based on specific scenarios.
To summarize above information:
gcloud config set storage/max_retries 500
gcloud config set storage/multipart_chunksize 1G
gcloud config set storage/multipart_chunksize 104857599