Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Streaming Buffer in BigQuery Taking Too Long to Flush After InsertAll Operation

Hi everyone,
I’m experiencing an issue with the streaming buffer in BigQuery. I am using the Java SDK (Java 8 )and the insertAll method to insert records. Recently, the streaming buffer has been taking an unusually long time to flush—up to 1 hour, even for just two rows of data.
Here are the key details:
SDK Version: Google BigQuery client library version 2.24.4
Java Version: Java 8
Behavior: The streaming buffer takes approximately 1 hour to clear, even for minimal data volumes.

I’d like to know:

  1. Is there a way to calculate or estimate how long data will remain in the streaming buffer before being flushed?
  2. Are there any alternative methods except batch data loading to insert data efficiently without waiting for so long because of the streaming buffer?
    Any advice or suggestions would be greatly appreciated. Thanks in advance!
Solved Solved
0 1 300
1 ACCEPTED SOLUTION

Hi @ankitjitterbit,

Welcome to the Google Cloud Community!

It seems you are encountering an issue with the streaming buffer in BigQuery taking too long to flush after an ‘insertAll’ operation. Currently, there is no way to predict or calculate the exact flush time because BigQuery ‘s flush logic is dynamic and can be influenced by various factors. It can vary based on data size, system load and internal processes.

Here are some alternative methods that might help your use case:

  • BigQuery Storage Write API: You may consider using BigQuery Storage Write API instead of the insertAll method. This newer API is designed to be more efficient and offers features like ‘exactly-once delivery’, which might help reduce the time data stays in your buffer.
  • Pub Sub and Dataflow: For continuous data streams, you may use Pub/Sub as a message queue and Dataflow for transformation and batch loading into BigQuery. This solution is more robust and scalable.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

View solution in original post

1 REPLY 1

Hi @ankitjitterbit,

Welcome to the Google Cloud Community!

It seems you are encountering an issue with the streaming buffer in BigQuery taking too long to flush after an ‘insertAll’ operation. Currently, there is no way to predict or calculate the exact flush time because BigQuery ‘s flush logic is dynamic and can be influenced by various factors. It can vary based on data size, system load and internal processes.

Here are some alternative methods that might help your use case:

  • BigQuery Storage Write API: You may consider using BigQuery Storage Write API instead of the insertAll method. This newer API is designed to be more efficient and offers features like ‘exactly-once delivery’, which might help reduce the time data stays in your buffer.
  • Pub Sub and Dataflow: For continuous data streams, you may use Pub/Sub as a message queue and Dataflow for transformation and batch loading into BigQuery. This solution is more robust and scalable.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.