Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Partition data from Datastream

Hello!

A bit of context. We plan to use Datastream and load events in GCS in AVRO format from Mysql. After tests we realized that AVRO files are stored with the prefix /year/month/date/hour/minute/some_file.avro in GCS. We want to create an external tables in BigQuery to load these AVRO file.

Is it possible to create external tables with partition from Datastream's data?

2 1 484
1 REPLY 1

BigQuery does not support partitioning on external tables based on the file path or the structure of the Cloud Storage URIs. The partitioning in BigQuery is based on a specific column in the table, not the structure of the data source.

If your AVRO data includes a TIMESTAMP or DATE column, you can create a partitioned table in BigQuery based on that. If your AVRO data doesn't include this timestamp information, but it's included in the structure of your Cloud Storage URIs, you would need to add an additional processing step to add this timestamp information to your data before it can be partitioned in BigQuery. This could be done using a data processing tool like Dataflow or Dataproc.