We have a date (DD/MM/YYYY) partitioned BQ table. We want to update a specific partition data in 'overwrite' mode using PySpark. So to do this, I applied 'spark.sql.sources.partitionOverwriteMode' to 'DYNAMIC' at session level as per the spark bq connector documentation. But still it deleted the other partitioned data which should not be happening.
df_with_partition.write.format("bigquery") \
.option("table", f"{bq_table_full}") \
.option("partitionField", f"{partition_date}") \
.option("partitionType", f"{bq_partition_type}") \
.option("temporaryGcsBucket", f"{temp_gcs_bucket}") \
.option("spark.sql.sources.partitionOverwriteMode", "DYNAMIC") \
.option("writeMethod", "indirect") \
.mode("overwrite") \
.save()
Can anyone please suggest me what I am doing wrong or how to implement this dynamic partitionOverwriteMode. Many thanks.
Hi @soumiknow,
Welcome to the Google Cloud Community!
You’re on the right track with using ‘partitionOverwriteMode’ set to ‘DYNAMIC,’ but the problem you are encountering is the incorrect placement of your method where you’re setting it. You might configure it at the Spark session level, not as part of the write options.
To address your question, here are potential ways that might help with your use case:
You may refer to this documentation, which might help you understand how to implement BigQuery connector with Spark session configuration.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.