I'm uploading a lot of Parquet files to GCP storage. In Storage, they need to be uncompressed for easy ingest into BigQuery. But I'm getting better than 2:1 compression with gzip on my files! I see that "gsutil cp -J <local_file> gs:<bucket>/<object>" does exactly what I want. The upload bandwidth is indeed much less than without "-J", and yet the uploaded object is uncompressed (at rest).
But I can't find anything in the REST API or the C++ client docs (google-cloud-cpp) that suggests this is even an option. It very much looks to me like gsutil is handling -J, ultimately, by setting Content-Encoding to "gzip" and then compressing the body. There also seem to be hints that the JSON API only supports this capability, although a Content-Encoding header is not mentioned on the JSON API docs.
I attempted to mimic gsutil with the following code:
namespace gcs = google::cloud::storage;
namespace bio = boost::iostreams;
gcs::ObjectWriteStream gcs_out =
_gcs_client->WriteObject(bucket_name, object_name, gcs::ContentEncoding("gzip"), gcs::ContentType("application/octet-stream"));
std::ifstream file_in(filepath, std::ios_base::in | std::ios_base::binary);
bio::filtering_ostream compressed_out;
compressed_out.push(bio::gzip_compressor());
compressed_out.push(gcs_out);
bio::copy(file_in, compressed_out);
gcs_out.Close(); // bio::copy will close file_in and compressed_out, but not gcs_out.
That works, but leaves the objects compressed in GCP. I think it's the equivalent of "gsutil cp -Z ...".
Is there a way to compress this data on-the-wire only, using the C++ API?