Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

rsync vs gcloud storage rsync -- syncing question

Hello, I need to recursively copy many videos to our gcloud storage bucket from a local Ubuntu system while maintaining metadata, providing for continuation and validation.  I am setting up some equipment to validate the following two options, but I was hoping someone could indicate which one aligns well with our goals.
Thank you for your interest and help!
cross-post: StackOverflow 

OPTIONS:

1) After mounting the gcloud storage bucket to a local mount-point and issuing the rsync command (see following) to transfer the file through the mount-point:
a) Should the local mount-point have the sync attribute? I believe that it should not have this attribute since rsync will be performing the synchronization and validation.
b) Will this configuration be fast or require several exchanges between rsync and the cloud?
c) Will metadata be preserved after the rsync has been completed?
d) Will an interrupted upload continue from the partially uploaded data?

sudo rsync --archive --human-readable --verbose --partial --progress --human-readable --itemize-changes --stats src dst

and the following command to validate the transferred file(s).

sudo rsync --recursive --checksum --verbose --human-readable --itemize-changes --stats --dry-run src dst

2) Alternatively, if I used gcloud's cli to perform the transfer directly from the local Ubuntu system to the bucket:
a) Will this configuration be faster than question #1, or will it require several exchanges between rsync and the cloud?
b) Will metadata be preserved after the transfer has been completed?
c) Will an interrupted upload continue from the partially uploaded data?

sudo gcloud storage rsync --recursive --delete-unmatched-destination-objects src dst

and the following command to validate the transferred file(s).

sudo gcloud storage rsync --recursive --dry-run src dst

Solved Solved
3 2 4,727
1 ACCEPTED SOLUTION

Hello @abitofhelp,

Welcome to Google Cloud Community!

When you mount a Cloud Storage bucket as a local file system, it uses Cloud Storage FUSE. I saw your question in Stack Exchange and I agree in your case that using a native tool is better than Cloud Storage FUSE. While it has a file system interface, there are limitations. Cloud Storage FUSE is not POSIX compliant. For a POSIX file system product in Google Cloud, see Filestore.

Cloud Storage FUSE does not transfer object metadata when uploading files to Cloud Storage, with the exception of mtime and symlink targets. If you want to preserve object metadata, consider uploading files using the Google Cloud CLI, JSON API, or the Google Cloud console. To know more about the limitations and differences from POSIX file system, check out the documentation.

When using gcloud cli, look into gcloud storage rsync. To answer some of your questions:

  • To preserve the object's metadata, use the --preserve-posix, -P flag. It causes the POSIX attributes to be preserved when objects are copied. gcloud storage will copy serveral fields provided by the stat command: access time, modification time, owner UID, owner group GID, and the mode (permissions) of the file.
  • The gcloud CLI tool resumes uploads in the gcloud storage cp and gcloud storage rsync commands when you upload data. If your upload fails/interrupted you can resume it by running the same command you used when you started the upload. You can also put the --no-clobber flag to prevent re-uploads of files that are already completed.

You can also contact Google Cloud Support to further look into your case. Hope it helped, thanks!

View solution in original post