Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataplex metadata in BigQuery

I need to read Dataplex metadata in BigQuery to perform data analysis on the Catalog, is it possible?

Solved Solved
0 12 4,345
2 ACCEPTED SOLUTIONS

Yes , Dataplex metadata can be analyzed in BigQuery, but first, you need to set up Dataplex to publish this metadata to BigQuery. This can be achieved either through the Dataplex UI or its API.

Dataplex Configuration

Before diving into the analysis, ensure the following configuration steps are completed in Dataplex:

  • Designate the specific BigQuery dataset where the metadata will be stored.
  • Define any essential data filters or transformations.

Querying Dataplex Metadata in BigQuery

Once the metadata is available in BigQuery, follow these steps for analysis:

  1. Pinpoint the BigQuery dataset housing the Dataplex metadata. This can be located in the Dataplex UI or retrieved via the Dataplex API.
  2. Construct a BigQuery query targeting the metadata of interest. Remember to reference the specific Dataplex metadata tables, not the generic INFORMATION_SCHEMA views.
  3. Execute the query to obtain your results.

For instance, to list all tables within a Dataplex data zone, you might use:

SELECT
table_name,
schema_name,
data_type,
location
FROM
`YOUR_PROJECT_ID.YOUR_DATASET_NAME.ENTITY_TABLES`

Deep Dive into Dataplex Metadata Analysis

The depth and type of analysis you can perform on Dataplex metadata in BigQuery largely depend on the specific columns and data in the metadata tables. It's beneficial to acquaint yourself with the structure of these tables to maximize the insights you can extract.

Potential analyses include:

  • Pinpointing the most frequented tables in your Dataplex lake.
  • Identifying underutilized tables.
  • Highlighting tables lacking crucial metadata.
  • Monitoring Dataplex metadata modifications over time.
  • Tracing data lineage connections between tables.
  • Conducting data quality assessments.

Enhancing Data Lineage and Quality Checks

While BigQuery offers robust capabilities for analyzing Dataplex metadata, certain analyses, like visual data lineage or automated quality checks, might necessitate supplementary tools or integrations. Here are some tools that can augment your analysis:

  • Data Catalog: A robust metadata management tool that aids in data discovery and understanding. It's excellent for visualizing data lineage and conducting quality checks.
  • Data Fusion: A cloud-native data integration service ideal for crafting and overseeing data pipelines. It's adept at automating data lineage tracking and quality checks.
  • Dataflow: Streamlines data transformation in both streaming and batch modes. It's suitable for crafting custom data quality checks.
  • Cloud Data Studio: A user-friendly tool for crafting interactive dashboards and reports, perfect for visualizing data lineage and quality metrics.
  • Looker: A BI and data analytics platform that's great for data exploration and visualization.

Leveraging BigQuery to scrutinize Dataplex metadata can yield invaluable insights, enhancing your data understanding and refining your data governance strategies.

View solution in original post

Hi @ms4446 

I am running into the same issue of not being able to see the metadata tab after selecting a lake. I have attached a BQ dataset to the zone in the lake and want to publish the metadata for its table into BQ (my assumption is that it is possible to publish all Dataplex metadata for all asset types, eg: BQ tables, cloud storage into BQ). Please see screenshot below:

shekhar0413_0-1714763297588.png

I have the "editor" role, and looking at the list of permissions for the editor role, I see that it contains the "dataplex.lakes.get" and the "bigquery.datasets.get" permissions. Also, I am able to view the lakes in dataplex (I was the one who created the lake) and view/query/create tables and datasets in BQ, so permission does not seem to be an issue. Additionally, when I search for for the BQ table in "Search" in dataplex, I see two results for the table, one with "System" as "BIGQUERY" and another as "DATAPLEX", so the metadata entry seems to be present.

Can you please provide some inputs.

Also, we are sharing some BQ datasets with a third party using analytics hub. So, the ultimate goal is to share the rich dataplex metadata for these tables with the 3rd party as well. My approach was to publish the dataplex metadata to BQ and then share the metadata as well through analytics hub. That way, the third party can simply subscribe to the dataset containing the metadata and create a linked dataset in their project. If there are other approaches  or best practices regarding how to share this rich dataplex metadata similar to sharing BQ data in analytics hub, I would be interested in those as well.

Thank you!

 

View solution in original post

12 REPLIES 12