We are sharing some BQ datasets with a third party (outside our org) using analytics hub. So, the subscriber is able to create a linked dataset in their project and any queries they run are billed to their project.
Now, we are exploring enriching the metadata for some of these datasets that are being shared using Dataplex. Are there any use cases/best practices on how this metadata can be shared? Two options come to mind:
Solved! Go to Solution.
Dataplex is designed to manage data across various storage mediums, organizing it into lakes and zones for structured access. It is not primarily used for metadata management but does maintain some metadata about these structures.
Data Catalog acts as a centralized metadata repository that enables search and discovery across various data assets in Google Cloud. It does not manage data directly but rather the metadata that describes data assets, such as those in BigQuery.
Analytics Hub:
Primarily used for sharing datasets, Analytics Hub does not directly handle the sharing of raw metadata stored in Dataplex or Data Catalog without converting this metadata into a structured dataset first.
Recommended Approach: A Hybrid Solution
Curate Essential Metadata:
Identify the most valuable metadata elements to share with the third party, which might include:
Structured Metadata Export:
Create a Metadata Dataset:
Share Metadata via Analytics Hub:
Additional Considerations:
Example (Illustrative):
Imagine a Dataplex setup with a lake named "customer_data" and a table "customer_transactions". You intend to share this metadata:
Your metadata dataset structure could look like this:
Entity Type |
Entity Name |
Column Name |
Data Type |
Description |
PII Flag |
Lake |
customer_data |
customer_id |
STRING |
Customer's unique ID |
Yes |
Table |
customer_transactions |
transaction_amount |
FLOAT64 |
Transaction amount in USD |
No |
drive_spreadsheetExport to Sheets
This hybrid solution leverages the strengths of Dataplex, Data Catalog, and Analytics Hub. It gives full control over which metadata elements are shared and provides a structured, easily consumable method for third parties to access your metadata.
Sorry for the confusion. The "Metadata" tab mentioned in the older Google Cloud Community post is inaccurate.There is no reference to this "Metadata" tab in the current Dataplex documentation, and many users, including yourself, have reported not being able to locate it.
As of now, the recommended approach for pushing Dataplex metadata to BigQuery involves using the Dataplex REST API to programmatically export metadata and then load it into BigQuery using your preferred method (e.g., Python scripts, Cloud Functions).
Alternative to Consider:
While the direct "Metadata" tab option seems unavailable, you might want to explore the Dataplex Metadata Export feature if available. Please note, as of the latest documentation, Dataplex does not explicitly mention an automatic metadata export to Google Cloud Storage in Avro format for direct use. You would typically need to implement custom solutions for exporting metadata. Check for the most current capabilities in the Dataplex documenta