Re: Effective way to transfer data from big query ...

aniketkhadke · 11-25-2024 08:46 AM

Hello team,

We we already have Apache Bheem data flow implemented to transfer records from big query to cloud SQL. This is working correctly the only problem your facing is the SQL CPU utilization keeps high during the full transfer. We have around 95 million records to transfer. Currently the beam generates single query per row for insert. I tried multiple approaches to address the CPU utilisation but nothing is helping. Could you please suggest any other way of transferring this volume of data effectively.

I even tried SQL csv import API it is keeping the CPU utilization low but as the files can be process sequentially only the time required is increasing multifold.

We have post Grace as our database with 4 vCPUs and 26 GB of memory.

Any help will be much appreciated. Thank you.

greb

Hi @aniketkhadke,

Thanks for sharing the details. Based on your scenario, it looks like the high CPU utilization in your Cloud SQL instance is likely due to the overhead of single-row insert queries. You may try few suggestions below to optimize data process:

Batch Inserts: Adjust your pipeline to insert data in batches (e.g., 10,000 rows at once).
Utilize Cloud SQL Import: Transfer data as CSV files to Cloud Storage and employ the Cloud SQL import API. It's significantly more effective for extensive datasets.
Scale Up Temporarily: Boost vCPUs and memory on your Cloud SQL instance while transferring to manage the workload.
Concurrent/Parallel Processing: Divide the dataset into segments and handle them simultaneously in Apache Beam.

These may help you to reduce CPU usage and speed up the transfer processing. Hoping this information could help!

Effective way to transfer data from big query to cloud SQL