I have a build machine that runs gsutil to copy a tarball from Google Cloud Storage and occasionally and seemingly at random gsutil will 404 during the download. There is a prior existence check with `gsutil -q stat ${gsutilURI}` that returns a success code. This is really weird behaviour and maybe I'm missing some option that avoids this?
Example output below
xxxxxx@xxxxxxxx C:\workspace\xxxxx>gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp gs://xxxxxx/svnCache/567962/trunk.tar ./trunk-567962.tar Copying gs://xxxxxx/svnCache/567962/trunk.tar... / [0 files][ 0.0 B/ 7.1 GiB] / [0 files][ 0.0 B/ 7.1 GiB] - \ \ [0 files][118.4 MiB/ 7.1 GiB] 118.2 MiB/s |
NotFoundException: 404 gs://xxxxx/svnCache/567962/trunk.tar does not exist. / / [0 files][335.0 MiB/ 7.1 GiB] 167.2 MiB/s - \ \ [0 files][552.8 MiB/ 7.1 GiB] 183.9 MiB/s | | [0 files][768.1 MiB/ 7.1 GiB] 191.8 MiB/s / - - [0 files][986.9 MiB/ 7.1 GiB] 207.4 MiB/s \ \ [0 files][ 1.2 GiB/ 7.1 GiB] 217.7 MiB/s | / / [0 files][ 1.4 GiB/ 7.1 GiB] 218.4 MiB/s - \ \ [0 files][ 1.6 GiB/ 7.1 GiB] 219.2 MiB/s | | [0 files][ 1.8 GiB/ 7.1 GiB] 220.3 MiB/s / - - [0 files][ 2.0 GiB/ 7.1 GiB] 220.8 MiB/s \ | | [0 files][ 2.3 GiB/ 7.1 GiB] 220.5 MiB/s / / [0 files][ 2.5 GiB/ 7.1 GiB] 220.6 MiB/s - \ \ [0 files][ 2.7 GiB/ 7.1 GiB] 220.2 MiB/s | / / [0 files][ 2.9 GiB/ 7.1 GiB] 219.7 MiB/s - - [0 files][ 3.1 GiB/ 7.1 GiB] 221.3 MiB/s \ | | [0 files][ 3.4 GiB/ 7.1 GiB] 223.6 MiB/s / NotFoundException: 404 gs://xxxxxx/svnCache/567962/trunk.tar does not exist. - - [0 files][ 3.6 GiB/ 7.1 GiB] 222.9 MiB/s \ \ [0 files][ 3.8 GiB/ 7.1 GiB] 222.2 MiB/s | / / [0 files][ 4.0 GiB/ 7.1 GiB] 225.2 MiB/s NotFoundException: 404 gs://xxxxxx/svnCache/567962/trunk.tar does not exist. - \ \ [0 files][ 4.2 GiB/ 7.1 GiB] 221.5 MiB/s | | [0 files][ 4.4 GiB/ 7.1 GiB] 220.7 MiB/s / - - [0 files][ 4.6 GiB/ 7.1 GiB] 222.6 MiB/s \ | | [0 files][ 4.9 GiB/ 7.1 GiB] 224.7 MiB/s / / [0 files][ 5.1 GiB/ 7.1 GiB] 223.2 MiB/s - \ CommandException: Some components of .\trunk-567962.tar were not downloaded successfully. Please retry this download.
I also have the `-D` logs
Hello @alexg2,
Welcome to the Google Cloud Community!
Check out this Stack Overflow post as you might also be having the same problem.
Parallel Composite Uploads is a strategy that is used when uploading large files. It can be significantly faster if network and disk speed are not limiting factors.
The Final object stored in your bucket is a composite object which only has a crc32c and not an MD5 hash. You have to use crcmod to perform integrity checks when downloading the object with gsutil. Check out this guide about CRC32C and installing crcmod. Your problem is also being experienced by other users, and they were able to fix it by installing crcmod.
Another fix is setting the parallel_composite_upload_threshold to 0. This means that you will disable all parallel composite uploads in gsutil.
If you also want to know which files failed during the copy, you can use the -L option to examine.
If the above options don't work, you can contact Google Cloud Support to further look into your case. Let me know if it helped, thanks!