I am attempting to load externally partitioned data into a BigQuery table using Python. The BQ table in question has one REQUIRED field, which corresponds to the custom partition key schema in the GCS uri.
For example, my custom uri which I use as the `source_uri_prefix` looks like this:
gs://my_bucket/my_table/{dt:DATE}
And my table schema is like this:
[
bigquery.SchemaField("field_a", "STRING"),
bigquery.SchemaField("field_b", "STRING"),
bigquery.SchemaField("dt", "DATE", mode="REQUIRED"),
]
However, whenever I attempt to load I get this error:
Is there a way to set the custom keys (in this case `dt`) mode in the source_uri_prefix to REQUIRED? Or is it just a given that any custom key in the source_uri_prefix is REQUIRED b/c it is automatically added during the load?
When using source_uri_prefix in BigQuery to load externally partitioned data, BigQuery automatically extracts the partition key from the specified path and adds it to the table. The partition key (in this case, dt) is considered REQUIRED by BigQuery when loading data because it must be present in the URI for the load operation to succeed.
However, if you're encountering an error, it might be due to other reasons, such as mismatched schema definitions, data type issues, or the way the source_uri_prefix is structured.
Here are a few things to check and ensure:
URI and Table Schema Alignment: Make sure that the date format in the source_uri_prefix exactly matches the expected format in the BigQuery table schema. If the date format or the naming convention differs, it could cause the load operation to fail.
Data in GCS: Ensure that the data files in GCS have the required fields (in this case, dt field) and that the values align with the format expected by BigQuery.
Table Configuration: Ensure that the table is correctly configured to accept externally partitioned data, and that the partition key (dt) is defined correctly in the schema.
Loading Method: Make sure you are using the appropriate method for loading externally partitioned data. For example, using bigquery.LoadJobConfig and setting the source_format appropriately (e.g., PARQUET, CSV).
Hi, thanks for the response!
Regarding the points you mentioned to check and ensure during load:
Is there something that I'm missing or should I set `dt` to have a NULLABLE mode in the schema file?