Data Catalog Entry Group Best Practices

I am doing some research and planning around ingesting non GCP data assets into the Google Data Catalog.  I understand that these entries will need to be associated under a new entry group, however I'd like to know what best practices everyone is using around the granularity of how many entry groups they create. 

For instance, should you:

  1. Create 1 entry group to represent all non GCP RBDMS?
  2. Create entry groups to represent each non GCP database technology? MSSQL, MySQL, Oracle, etc
  3. Create entry groups to represent each non GCP database instance?

Documentation does not really call out what is intended and I would hate to go down the wrong path and hit limitations in the future.

Solved Solved
0 1 762
1 ACCEPTED SOLUTION

Entry groups are basically categories for the data. A good explanation can be found in [1]. You can categorize the entry groups as they are suitable but the most general setup should be naming the entry groups based on the source (i.e. bigquery, pubsub, mysql). This would allow you to identify where the data came from.

Therefore option #2 should best but both option #1 and option #2 are valid. #3 might cause issues as the number of entry groups can significantly increase based on the number of instances.

Also, we created a documentation upgrade request [2] on your behalf and additional details should be posted publicly in the near future.

[1] https://cloud.google.com/data-catalog/docs/entries-and-entry-groups

[2] https://issuetracker.google.com/205624534

View solution in original post

1 REPLY 1

Entry groups are basically categories for the data. A good explanation can be found in [1]. You can categorize the entry groups as they are suitable but the most general setup should be naming the entry groups based on the source (i.e. bigquery, pubsub, mysql). This would allow you to identify where the data came from.

Therefore option #2 should best but both option #1 and option #2 are valid. #3 might cause issues as the number of entry groups can significantly increase based on the number of instances.

Also, we created a documentation upgrade request [2] on your behalf and additional details should be posted publicly in the near future.

[1] https://cloud.google.com/data-catalog/docs/entries-and-entry-groups

[2] https://issuetracker.google.com/205624534