writeDisposition WRITE_TRUNCATE with the bigquery ...

aucampia · 12-09-2022 04:53 AM

I want to use the BigQuery Storage Write API with Pending mode, but when I do BatchCommitWriteStreams
I want to truncate the partition/table and replace it's content with the new content atomically. With the normal BigQuery load jobs I can use writeDisposition WRITE_TRUNCATE [ref] and get this behaviour. Is there some way to get this behaviour with BigQuery Storage Write API?

alvaroviebrantz

The BigQuery Storage Write API doesn't support any kind of WriteDisposition configuration. It's a Query/Load/Copy Job configuration via the BigQuery v2 API.

If you want to start a WriteStream with the BigQuery Storage Write API, but truncate the table before doing so, one interesting way for achieving that is by writing data using the BQ Storage Write API to a temporary table and then running a COPY job that truncates the target table with the contents of the temporary table. The COPY job can be set up with the WriteDisposition configuration to truncate the table.

There is a similar process that follow that pattern in the Spark Dataproc connector. But they use a QUERY Job with a MERGE SQL statement . For reference: https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/14fa9b879b62a535c37c906ffb76386...

writeDisposition WRITE_TRUNCATE with the bigquery storage write API