Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery Streaming Buffer Manipulation

Hi All,

 

I am using BigQuery as a target for CDC implementation where I am using Debezium to pull the CDC records from a source database. For all inserts we are using the BigQuery streaming insert feature as it is very fast in comparison to the standard insert methods.

I found out that Streaming Inserts load records into a Streaming Buffer where the records reside until the time the buffer is flushed into the regular BQ storage (Buffer lifetime is around 90 minutes). This poses issues with our CDC record ingestion process in case a change is made on records which have been recently created/inserted.

Is there a way to manually flush or force flush the streaming buffer into the BQ storage?

Any other options to manage record changes (Updates/Deletes) for records which are sitting in the streaming buffer?

 

Thanks in advance for your help !

 

Cheers !

1 3 10.5K
3 REPLIES 3

You can change the filters so they omit data that could be in the current streaming buffer.

If for some reason you are using a partitioned table, when you update the data you can add a WHERE clause where the timestamp has an interval of 40 to 90 minutes like the example below:

WHERE Partition_time < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 40 MINUTE).

Also, you can create a job as mentioned in the next documentation, with a job you could specify at which hours you want it to run.

Hi Eduardo,

What if we are required to perform Update operations on the recently inserted records currently in Stream Buffer. Is there any alternative to perform DML operations on inserted data through Stream Inserts ? What is the Scope and limitations pertaining to the above use case? Could you please enlighten us.

I would be grateful for your prompt response. Looking forward to your reply.

Cheers!!

Hi Eduardo,

What if my table has no date or time info in it?

I tried the trick on partition time but it reported error "

Unrecognized name: Partition_time at [12:5]"