Hello Team,
I have a Apache Beam dataflow implemented to transfer approximately 100 million records from Q to Cloud SQL. But as the dataflow runs one insert query per record, the CPU of Cloud SQL instance is getting fully utilized. I want to fix this. We cannot increase the Cloud SQL CPU size as the actual app usage is low (5-10%), this problem occurs during daily batch run.
One of the suggested way by DBA is export BQ data to GCS and then import using Cloud SQL API. But this creates another problem that we can import only single file at a time.
I wanted to understand what can be the best possible solution for this.
1. BQ to GCS to Cloud SQL as stated above.
2. Increase Cloud SQL CPU just during the batch run.
3. Any way to optimize/reduce the number of SQL inserts being triggered in dataflow? I tried setting up batch size - but that is only a commit size, no of insert statements are still same.
Hi @aniketkhadke,
This is a pretty common bottleneck when using Dataflow to write large volumes into Cloud SQL—especially if you're doing one insert per row. At 100M records, that’s going to crush the SQL CPU during the batch window.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
@mcbsalceda wrote:To scale this, you’d need to split the export into multiple files then trigger parallel imports into staging tables or handle them sequentially with automation
Are you saying one staging table per CSV and run the import simultaneously? Whether that's supported in Cloud sql?