Data Fusion - Secret Manager integration

I am embarking on the setup of replication jobs within Google Data Fusion and I want to integrate Google Secret Manager. While I am aware of Data Fusion's native Secure Storage capabilities, my specific interest lies in leveraging Google Secret Manager for secure management of sensitive information. Use GSM for store passwords and users is mandatory in my organization.

I would greatly appreciate any guidance or best practices from the forum community on how to seamlessly integrate Google Secret Manager into the configuration of replication jobs in Google Data Fusion.

0 1 175
1 REPLY 1

Integrating Secret Manager (GSM) with Data Fusion, particularly for replication jobs, involves a series of steps to ensure the secure management of sensitive information such as database usernames, passwords, and API keys.Here is an approach you might consider:

1. Enable Secret Manager API

First, ensure the Secret Manager API is enabled in your Google Cloud project. This can be done through the Cloud Console or by using the gcloud command-line tool, setting the foundation for secret management.

2. Create Secrets in Secret Manager

  • Navigate to the Secret Manager section in the  Cloud Console.

  • Use the “Create Secret” option to securely store each piece of sensitive information (e.g., database passwords, API keys), assigning each secret a unique, descriptive name for easy identification.

3. Grant Data Fusion Access to Secrets

  • Identify the Data Fusion service account, typically formatted as service-PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com.

  • In the IAM & Admin section of the Cloud Console, assign the "Secret Manager Secret Accessor" role to this service account, ensuring it has the necessary permissions to access the secrets.

4. Access Secrets in Data Fusion Pipelines

Due to the absence of a built-in UI element in Data Fusion for directly selecting secrets from GSM, a custom solution is required to retrieve secrets at runtime:

  • Custom Plugin/Script: Develop a custom Data Fusion plugin or script that leverages the Secret Manager API to fetch secret values dynamically during pipeline execution.

  • External Script: Alternatively, create a script outside of Data Fusion that retrieves secrets prior to pipeline startup. This script could set environment variables or populate a configuration file that your pipeline can subsequently read.

5. Security and Best Practices

  • Least Privilege: Ensure the Data Fusion service account is granted only the permissions necessary to access the secrets it requires, minimizing potential security risks.

  • Audit Logging: Enable audit logging for both GSM and Data Fusion to maintain comprehensive records of secret access and pipeline activities.

  • Secret Rotation: Regularly rotate secrets within Secret Manager and promptly update the references in your Data Fusion jobs to align with these changes.

  • Workflow Orchestration IAM: For those utilizing orchestration tools like Cloud Composer or Workflows, verify that their service accounts are also equipped with the appropriate Secret Manager permissions.

Additional Considerations

  • Versioning: Utilize Secret Manager's versioning capabilities for enhanced control and the ability to rollback secrets if necessary. Explicitly reference secret versions in your integration code to leverage this feature.

  • Broad Applicability: While this guide focuses on replication jobs, the outlined techniques are applicable across various Data Fusion scenarios involving sensitive data.

  • Reusable Secrets 'Wrapper': Consider developing a reusable "secrets fetcher" script or custom code that can accept secret names as parameters. This approach increases flexibility and efficiency across different Data Fusion jobs.

Integrating Secret Manager with Data Fusion necessitates some custom development to facilitate secure secret retrieval. By adhering to the outlined steps and incorporating these security best practices, you can establish a robust and secure method for managing sensitive information within your Data Fusion workflows, thereby enhancing overall data security and operational efficiency.