Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Reusable Cluster between pipelines not working

I have two Data Fusion pipelines that replicate data from MySQL table to BigQuery tables. The second pipeline executes right after the first pipeline is completed. For time-saving and scalability purposes, I want to reuse the Dataproc cluster provisioned for the first pipeline. I have created an autoscaling Dataproc compute profile with the correct settings to reuse the clusters as specified here: https://cloud.google.com/data-fusion/docs/how-to/reuse-clusters

When triggering the pipelines it always provisions and creates a second cluster when executing the second pipeline. I have made sure both pipelines are configured exactly the same on the customizable compute config settings.

0 1 143
1 REPLY 1

Hi @DiegoMoralesFCA,

Welcome to Google Cloud Community!

You are right, both must have the same profile settings for you to reuse clusters between pipelines. Another workaround is to increase your Max Idle Time to 30 mins or 45 mins. Alternatively, you could upgrade your Cloud Data Fusion to the latest version. Take note, it is best practice to backup your system before initiating the upgrade.  

If the issue persists, please contact Google Cloud Support. When reaching out, provide detailed information and relevant screenshots. This will assist them in diagnosing and resolving your issue more efficiently.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.