Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Load 3K MM of records into BigTable

Hi All

We have implemented and it's working, a Dataflow pipeline to send data from BigQuery to BigTable. The data volume in average is 3K MM of records daily and we load to BigTable as batch process because the data saved on BigQuery the source, are updated once at day. At this moment we use 16 nodes from BigTable to get few hours of execution time of Dataflow and finish that 3K MM of records. Based on your experience, please ¿Can you recommend or guide other new idea to load that volume of data into BigTable?


Thanks in advance

Solved Solved
0 1 446
1 ACCEPTED SOLUTION

Hello @fbermeoq 

Give it a try with below recommendations to optimize the data load to BigTable.

- Use Cloud Storage as an intermediate data storage layer between BigQuery and BigTable. This approach can provide a more efficient way to transfer data in bulk and can reduce the load on the BigTable cluster. Using Cloud Storage also provides additional options for data processing, such as using Cloud Functions or Cloud Run.

- Optimize your current implementation by tweaking the number of nodes in your BigTable cluster or adjusting the configuration of your Dataflow pipeline. You can experiment with different settings and see which configuration works best for your use case.

View solution in original post

1 REPLY 1

Hello @fbermeoq 

Give it a try with below recommendations to optimize the data load to BigTable.

- Use Cloud Storage as an intermediate data storage layer between BigQuery and BigTable. This approach can provide a more efficient way to transfer data in bulk and can reduce the load on the BigTable cluster. Using Cloud Storage also provides additional options for data processing, such as using Cloud Functions or Cloud Run.

- Optimize your current implementation by tweaking the number of nodes in your BigTable cluster or adjusting the configuration of your Dataflow pipeline. You can experiment with different settings and see which configuration works best for your use case.