We are looking to have a search interface on the Dataplex metadata. As Dataplex Data catalog ingests all the metadata from Bigquery assets and it can take addtional metadata through tag templates, business glossory etc. I am looking for having search interface similar to Data catalog. I have searched for the Data catalog APIs and I found below. Suppose if I search with any tag template name or business term name that are attached to a table, I need to get the dataset name with the description. I do not need to see the data, just the dataset name and the full description works. How can I achieve this.
https://datacatalog.googleapis.com/v1beta1/catalog:search
https://datacatalog.googleapis.com/v1beta1/entries:lookup
Below is a strategy that combines the power of Google Cloud Data Catalog and its APIs to achieve what you're looking for:
catalog:search
: Finds entries matching your search (e.g., tag name, business term).entries:lookup
: Gets the full details of each found entry, including dataset name and description.Implementation Steps
catalog:search
to find matching entries.entries:lookup
to get the full details.Here's a Python code snippet demonstrating how to use the APIs to perform the search and retrieve the dataset details:
from google.cloud import datacatalog_v1beta1
def search_dataplex_metadata(query):
client = datacatalog_v1beta1.DataCatalogClient()
scope = datacatalog_v1beta1.types.SearchCatalogRequest.Scope()
scope.include_project_ids = ["your-project-id"]
request = datacatalog_v1beta1.types.SearchCatalogRequest(
query=query,
scope=scope
)
search_results = client.search_catalog(request=request)
for result in search_results:
entry = client.lookup_entry(linked_resource=result.linked_resource)
print(f"Dataset Name: {entry.name}, Description: {entry.description}")
# Example usage
search_dataplex_metadata("your_tag_template_name OR your_business_term")
Important Considerations
I know in Python you can get assets, entities, and datasets fairly easily. I am going to have to build out a UI on top of Dataplex to accomplish automation and even possibly replace data Discovery as we are having issues with schemas not deleting with a table and rebuilt tables getting lost in no mans land somewhere. https://cloud.google.com/data-catalog/docs/concepts/metadata
Building a UI on top of Dataplex for managing and automating metadata tasks, especially to address issues with schema synchronization and data discovery, is a great idea. Here’s a plan on how to accomplish this, incorporating your need to handle assets, entities, and datasets, along with leveraging the Data Catalog API for metadata management:
Authentication and Setup:
Ensure that your application can authenticate with Google Cloud services using service accounts with appropriate permissions.
Enable necessary APIs, including Data Catalog API and Dataplex API.
Fetching Metadata:
Use Data Catalog API to fetch assets, entities, and datasets.
Implement functions to search and retrieve metadata entries.
Handling Metadata Updates:
Automate the synchronization of metadata to ensure schemas are correctly handled when tables are deleted or rebuilt.
Use hooks or triggers in your data pipeline to update Data Catalog when changes occur.
Building the UI:
Develop a user interface that allows users to search, view, and manage metadata.
Integrate Data Catalog API to allow users to perform searches and view detailed information about datasets, assets, and entities.
Automation and Maintenance:
Implement scripts or background jobs that periodically check for inconsistencies in metadata and update accordingly.
Provide tools within the UI for users to manually trigger metadata synchronization or corrections.
Some Key Considerations
Hi @ms4446 ,
Thank you so much for the detailed plan for the UI, I am trying to build a UI and in the initial stages,
I have couple of questions
Authentication and Setup: what options do I have, can a valid user with GCP cloud credentials login ? can I integrate with IAMs of simple authentication service? is there something GCP managed service for this?
UI: any simple UI framework you can suggest, I am more a backend guy and not a UI person 😊