Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Use cases to share tabular data via GCS instead of BigQuery?

Do you know use cases where analytical data product teams need to give access to gcs bucket objects instead of bigquery views?

We could think that everything should be shared via bigquery views and that it should be privileged.

Exemple: team A gives the role "roles/storage.objectViewer" on a bucket in the GCP project A to a service account on the GCP project B, owned by team B.

I exclude media such as image, video, etc.

0 2 129
2 REPLIES 2

Hi @Jbbqqf,

Welcome to Google Cloud Community!

Yes, there are specific scenarios where granting access to GCS bucket objects is preferable:

  • Cost efficiency: Compared to regularly searching large datasets in BigQuery, accessing them directly via GCS may be less costly.
  • Processing of Raw Data: GCS is suitable for ETL(Extract, transform, and load) procedures or instruments that require raw data files in CSV or JSON formats.
  • Tool Compatibility: Compared to SQL queries, many data processing tools perform better when used with raw data files.
  • Granular Access Control: This feature, which can be essential for security and compliance, gives you exact control over who can access which files or directories.
  • Real-Time Access: BigQuery data refresh cycles don't cause delays; instead, GCS offers instant access to the most recent data.

GCS is often more suitable for scenarios involving raw data, cost management, and real-time updates, while BigQuery is best for structured data analysis. Using GCS can streamline workflows and reduce costs in specific use cases.

Also, feel free to utilize the setting and managing IAM policies on buckets for your reference.

I hope the above information is helpful.

 

Here are a few cases I've come across: 

Dirty or unstructured data is often best handled outside of BigQuery in many cases. Think images, videos, JSON, or dirty CSVs with common inconsistencies between rows.

Archiving data for long-term backup is more cost-effective than BigQuery.

Rarely used data that you don't frequently work with, because GCS has lower storage costs compared to BigQuery.

Temporary data that needs scrubbing or processing elsewhere before being brought into, or out of, BigQuery.

Really large datasets that don't really need to be queried often are good for GCS because of storage costs.

Blob storage of large binary objects like logs, backups, or media files aren't well suited for BigQuery.

Interoperability with web applications is one I've come across. Depending the data and use case, it can be much more efficient to store data as flat files in GCS, especially if you plan to frequently update rows.