Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Data glossary creation and update in Dataplex

Hey Experts,

There are a way to create a glossary inside Dataplex trough script, and insert all the terms with their respective descriptions in a more automatically way? How should I do it? 

Asking this because I would like to insert a huge business glossary that is docx into Dataplex... I was planning to do it trough the interface since I didn't find any documentation on how to create categories, terms and update the terms description trough coding. Can anyone help me? 

0 2 656
2 REPLIES 2

As of the date of this post, the Business Glossary component is defined as "pre generally available".  I read that as "beta".  What this also means is that the service may/will have more features/functions/changes made to it before it is formally released as a generally available service.  At present, there indeed are no published APIs for import/export of data.  I would imagine that these APIs will be available at general availability.  What I would suggest is that you contact your Google Cloud representative.  They may be able to share with you a date for general availability.  Alternatively, they may be able to preview additional features (such as import) that might be of value to you.

Hi AlexHeringer,

Welcome to the Google Cloud Community!

Currently, the Dataplex API does not offer a specific bulk import endpoint to manage glossaries in a single call for creating categories, adding terms, and updating descriptions. However, here are some alternative approaches that might help resolve your use case:

  • Understanding the Dataplex API and Concepts: You may utilize the Dataplex API to interact with your Dataplex resources, including glossaries, categories, and terms. The API provides individual endpoints for creating, reading, updating, and deleting these resources. You'll need a service account with the appropriate permissions to authenticate with the API.
  • Python Script (Using Google Cloud Client Library) - You may create a Python script using the Google Cloud Client Library for Dataplex (pip install google-cloud-dataplex google-auth). This script will make multiple API calls to create categories, create terms, associate terms to categories, and update descriptions. 
  • Error Handling: Make sure you add robust error logging to identify and address any API issues.
  • Data Validation: Before calling the API, validate your extracted data to confirm that it conforms to the expected format to avoid API failures.
  • Logging: You may implement proper logging mechanisms to track the execution of your script, identify problems, and monitor the creation/update of the glossary.

You may also refer to this documentation for guidance on creating and managing Dataplex business glossaries.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.