Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Cross Project Data Sources Lineage in Dataplex

Hi Team,

I just want to understand , is it possible to create lineage using Dataplex for the BQ tables which are in two separate projects. If Yes , how can we achieve that?

For Ex. , i have a destination table in Proj A , and this table is created by using some data manipulation using SQL on the source table which resides in Proj B. How can i see this lineage in Dataplex.

1 2 711
2 REPLIES 2

I posted your question to Gemini on GCP. Here is what it said:

Dataplex lineage tracking currently does not span across projects . It's designed to track lineage within a single Google Cloud project.

So, in your scenario where the source table is in Project B and the destination table is in Project A, Dataplex will only track lineage within each project individually. You won't see a direct lineage connection between the source and destination tables.

Here's why this limitation exists:

  • Security and Data Isolation: Projects are designed to provide data isolation and security. Allowing lineage tracking across projects could potentially expose data from one project to users in another project, which might not be intended.
  • Complexity: Tracking lineage across projects would introduce significant complexity in terms of data access, permissions, and data governance.

Workarounds and Alternatives:

While Dataplex doesn't directly support cross-project lineage, here are some workarounds and alternatives you can consider:

  1. Data Catalog Metadata: You can manually create entries in Data Catalog for both the source and destination tables, even though they are in different projects. This will allow you to see them together in Data Catalog, but it won't provide a direct lineage connection.
  2. Custom Lineage Tracking: You can implement your own lineage tracking system using tools like Apache Airflow or other workflow management systems. This would involve logging the data flow and transformations manually, but it would give you more control over the lineage information.
  3. Data Transfer Service: If you are moving data between projects frequently, consider using Google Cloud's Data Transfer Service. This service can help you move data between projects while maintaining some level of metadata tracking.
  4. Data Federation: Explore data federation solutions like BigQuery Federation or other data virtualization tools. These tools can allow you to query data across projects as if it were in a single location, potentially simplifying lineage tracking.
  5. Project Consolidation: If possible, consider consolidating the source and destination tables into the same project. This would be the most straightforward way to enable Dataplex lineage tracking between them.

It is possible. Please, refer to official documentation and not LLM.