Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataplex Attribute store tagging to Bigquery

Hi,

 

I have create a lake where it contains a dataset with 2 tables. Next, I have create a taxonomy in Attribute store for attaching tags to columns of bigQuery in Policy tags. When I try to do that the attribute store only attaches it's attribute column attribute not to the policy tag column in bigquery. Also, I am not able to see those tags in the bigquery. Below is the link of the google doc that I am following.

Link:-https://cloud.google.com/dataplex/docs/attribute-store  

Can you please help on how I can attach tags from dataplex to bigquery policy tags column and how it's going to be visible in both bigQuery as well as in Dataplex?

0 6 2,014
6 REPLIES 6

Hi @Vishesh1998 ,

Here’s how you can attach Dataplex Attribute Store tags to BigQuery policy tags:

Attaching Dataplex Attribute Store Tags to BigQuery Policy Tags

  1. Create Lake and Data Zone in Dataplex
  • Create a Lake:
    • Go to the Dataplex Console.
    • Click "Create Lake", provide the necessary details, and follow the setup instructions.
  • Create a Data Zone:
    • Within the lake, click "Create Data Zone".
    • Assign datasets and tables you want to manage.
  1. Create a Taxonomy and Tags in the Attribute Store
  • Create a Taxonomy:
    • Access the Attribute Store.
    • Click "Create Taxonomy" and define categories that fit your governance needs.
  • Create Attributes:
    • Within the taxonomy, click "Create Attribute" to define the specific attributes.
  1. Attach Attributes to Tables and Columns
  • Attach Attributes:
    • Navigate to the Dataplex Console and select your Data Zone.
    • Click the "Tables" tab, find the table, and select it.
    • Use the "Tags" section to attach attributes to specific columns.
  1. Create Policy Tags in BigQuery (if not existing)
  • Create Policy Tag Taxonomy:
    • Go to the BigQuery Console.
    • Under "Data Governance", create a new policy tag taxonomy, if not existing.
    • Define policy tags that match the Dataplex attributes.
  1. Synchronize Attributes with BigQuery Policy Tags

Note: Direct synchronization between Dataplex attributes and BigQuery policy tags is not available out-of-the-box. This requires a custom solution.

Conceptual Approach:

  • Fetch Dataplex Attributes:
    • Use the Dataplex API to retrieve attached attributes.
    • Example API call to list attributes:
     
    gcloud dataplex attributes list --zone=<ZONE_NAME> --table=<TABLE_ID>
    
  • Apply Policy Tags in BigQuery:
    • Use the BigQuery API to apply policy tags based on Dataplex attributes.

Improved Python Code for Synchronization

Here's a refined Python script to synchronize Dataplex attributes with BigQuery policy tags:

 
from google.cloud import bigquery
from google.cloud.bigquery import policytagmanager_v1

# Initialize clients
bq_client = bigquery.Client()
policy_client = policytagmanager_v1.PolicyTagManagerClient()

# Fetch Dataplex attributes logic
# Replace this with actual logic to fetch attributes from Dataplex
# Example dictionary mapping column names to policy tag IDs
column_to_policy_tag_map = {
    'email': 'projects/project_id/locations/location/taxonomies/taxonomy_id/policyTags/tag_id'
}

def apply_policy_tags(dataset_id, table_id, column_to_policy_tag_map):
    table_ref = bq_client.dataset(dataset_id).table(table_id)
    table = bq_client.get_table(table_ref)
    new_schema = table.schema[:]  # Copy the schema for modification

    for field in new_schema:
        policy_tag_id = column_to_policy_tag_map.get(field.name)
        if policy_tag_id:
            field.policy_tags = {'names': [policy_tag_id]}

    table.schema = new_schema
    bq_client.update_table(table, ['schema'])

# Apply policy tags to the specified table
apply_policy_tags('my_dataset', 'customer_data', column_to_policy_tag_map)

Explanation of Improvements:

  • Efficient Schema Update: Copies the schema for modification and updates it in a single API call.
  • Flexibility: Takes a mapping dictionary as input, making it easier to apply multiple policy tags.
  1. Verify Tags in BigQuery and Dataplex
  • Check in BigQuery:
    • Navigate to the dataset and table in BigQuery console.
    • Verify the policy tags are applied to the relevant columns.
  • Check in Dataplex:
    • Go to the table's details view in Dataplex.
    • Confirm that the attributes are correctly displayed and synchronized with BigQuery policy tags.

Additional Tips

  • Error Handling: Implement error handling in your synchronization script to manage cases such as missing attributes or duplicate policy tags.
  • Example:
 
try:
    # Fetch attributes and apply policy tags
except Exception as e:
    print(f"Error applying policy tags: {e}")
  • Scheduling: If tags are frequently added or updated, schedule your synchronization script using tools like Cloud Scheduler to ensure consistent updates.
  • Example:
    • Set up a Cloud Scheduler job to run your Python script periodically.
  • Data Lineage: Use Dataplex's data lineage features to track the usage of your BigQuery tables and how tags propagate through your data pipelines.

Important Considerations

  • One Policy Tag per Column: BigQuery enforces a limit of one policy tag per column. Plan your synchronization logic carefully if dealing with complex attributes.
  • Alternative Approaches: Depending on your needs, consider using the Dataplex REST API or the Dataplex Java client library for more comprehensive solutions.

Hi,

Thanks for the reply! I don;t see anywhere to attach tags to policy tags column. The only accessible column where I can attach is column attribute. Also wanted to confirm there is direct to directly attach tags to bigquery policy tag column through dataplex? We have to go through synchronizing both the taxonomy together then only we will be able to attach tags to policy tag column in bigquery and see in both?

 

Next question what would be a good approach to add security classification tags from bigquery through policy tags or through dataplex using attribute store?

 

Keep in mind that we need to implement lifecycle policies on those and control it all through dataplex

 

 

ex.PNG

Currently, Dataplex does not provide a direct integration to attach its Attribute Store tags directly to BigQuery policy tags. Instead, you need to synchronize Dataplex taxonomies and BigQuery policy tags using custom scripts or tools. Here’s a recap and elaboration on the process:

Synchronization Steps:

  1. Create Taxonomies and Tags:

    • Dataplex Attribute Store: Create taxonomies and attributes in Dataplex.

    • BigQuery Policy Tags: Manually create corresponding policy tags in BigQuery.

  2. Attach Attributes in Dataplex:

    • Use the Dataplex console to attach attributes to columns in your tables.

  3. Synchronize Attributes with Policy Tags:

    • Develop a script or tool to fetch attached attributes from Dataplex and apply corresponding policy tags in BigQuery.

Example Synchronization Workflow

Here's a sample workflow for synchronizing Dataplex attributes with BigQuery policy tags:

  1. Fetch Attributes from Dataplex:

    • Use the Dataplex API to list attributes attached to tables/columns.

    • Example API call:

      gcloud dataplex attributes list --zone=<ZONE_NAME> --table=<TABLE_ID>
  2. Map Attributes to Policy Tags:

    • Develop a script that maps Dataplex attributes to corresponding BigQuery policy tags based on a predefined mapping.

  3. Apply Policy Tags in BigQuery:

    • Use the BigQuery API to update the table schema and apply policy tags.

 

from google.cloud import bigquery
from google.cloud.bigquery import policytagmanager_v1

bq_client = bigquery.Client()
policy_client = policytagmanager_v1.PolicyTagManagerClient()

# Example mapping
column_to_policy_tag_map = {
    'email': 'projects/project_id/locations/location/taxonomies/taxonomy_id/policyTags/tag_id'
}

def apply_policy_tags(dataset_id, table_id, column_to_policy_tag_map):
    table_ref = bq_client.dataset(dataset_id).table(table_id)
    table = bq_client.get_table(table_ref)
    new_schema = table.schema[:]  # Copy the schema for modification

    for field in new_schema:
        policy_tag_id = column_to_policy_tag_map.get(field.name)
        if policy_tag_id:
            field.policy_tags = {'names': [policy_tag_id]}

    table.schema = new_schema
    bq_client.update_table(table, ['schema'])

apply_policy_tags('my_dataset', 'customer_data', column_to_policy_tag_map)

Security Classification Tags: Policy Tags vs. Dataplex Attribute Store

Policy Tags in BigQuery:

Pros:

  • Native Support: BigQuery policy tags are designed for column-level security and integrate natively with BigQuery’s data access controls.

  • Fine-Grained Access Control: Use policy tags to implement column-level access policies that are enforced directly by BigQuery.

  • Ease of Use: Integrated with IAM, making it easy to manage permissions.

Cons:

  • Limited Flexibility: Each column can have only one policy tag, which might not fit complex classification requirements.

Use Case:

  • Sensitive Data: Ideal for marking sensitive data and controlling access at a granular level within BigQuery.

Dataplex Attribute Store:

Pros:

  • Comprehensive Management: Dataplex offers a broader set of metadata management capabilities, including data classification, lifecycle management, and governance.

  • Centralized Governance: Allows managing data across multiple systems, not just BigQuery.

  • Lifecycle Policies: Supports defining lifecycle policies that can be used to automate data management tasks.

Cons:

  • Indirect Control: Requires synchronization with BigQuery policy tags for column-level access control, adding complexity.

Use Case:

  • Data Governance: Suitable for a broader data governance strategy across different storage systems and data lakes.

Recommended Approach

Combining Both:

  • Primary Use Case: Use BigQuery policy tags for implementing immediate column-level security within BigQuery.

  • Governance and Lifecycle Management: Use Dataplex Attribute Store to manage and govern security classification tags across your data ecosystem. Implement lifecycle policies in Dataplex for comprehensive data management.

Implementation Workflow:

  1. Define Classifications:

    • Create a unified taxonomy of security classifications in both Dataplex and BigQuery.

  2. Tag Data:

    • Attach classification tags using Dataplex attributes for broader governance.

    • Apply policy tags to sensitive columns in BigQuery.

  3. Synchronize:

    • Develop or use existing tools to keep Dataplex attributes and BigQuery policy tags in sync.

  4. Manage Lifecycle Policies:

    • Use Dataplex to define and enforce lifecycle policies for your data based on the attached attributes.

Implementation Example

Here’s a consolidated implementation flow for adding security classification tags:

  1. Setup Dataplex and BigQuery Taxonomies:

    • Define your security classifications in Dataplex and BigQuery.

  2. Attach Tags in Dataplex:

    • Use Dataplex to attach security attributes to your data.

  3. Sync Tags with BigQuery:

    • Implement a synchronization process that maps Dataplex security attributes to BigQuery policy tags.

  4. Enforce Policies:

    • Use BigQuery policy tags for immediate access control.

    • Use Dataplex for broader data governance and lifecycle policies.Python example:

I get

ImportError: cannot import name 'policytagmanager_v1' from 'google.cloud.bigquery'

The only PolicyTagManagerClient I can find is under DataCatalog, that we try to avoid in favour of dataplex. Can you help shed some light on this?

 

Hi,

Thanks for the quick reply!

Can you please advice me the process of how I can add tags to bigquery tables using dataplex ?

What is the preferred method on the dataplex to add tags through attribute store or tag templates? Also can you add steps for each one of them?

Adding tags to BigQuery tables using Dataplex can be done through either the Attribute Store or Tag Templates

Method 1: Adding Tags Using Dataplex Attribute Store

Attribute Store allows you to create custom taxonomies and attach them as tags to BigQuery tables. This method is useful for managing and governing metadata across different systems.

Steps:

  1. Create a Taxonomy in Attribute Store:

    • Go to the Dataplex Console.

    • Navigate to Attribute Store.

    • Click "Create Taxonomy" and provide a name and description.

    • Click "Save".

  2. Create Attributes in the Taxonomy:

    • Select the taxonomy you just created.

    • Click "Create Attribute".

    • Define the attribute name, description, and possible values (if applicable).

    • Click "Save".

  3. Attach Attributes to BigQuery Tables:

    • In the Dataplex Console, navigate to "Manage" and select your Data Zone.

    • Go to the "Tables" tab, find the desired table, and select it.

    • In the table details, go to the "Tags" section.

    • Click "Attach Attribute".

    • Choose the taxonomy and attribute, then click "Attach".

  4. Verify the Tags:

    • Ensure the tags are correctly attached by checking the "Tags" section in the table's details in Dataplex.

Pros of Using Attribute Store:

  • Allows centralized metadata management across different storage systems.

  • Supports complex data governance requirements.

Cons:

  • Requires manual synchronization if you also need policy tags in BigQuery.

Method 2: Adding Tags Using Tag Templates

Tag Templates allow you to create a schema for metadata tags and apply them to BigQuery tables. This method integrates well with BigQuery and Dataplex, offering flexibility and visibility.

Steps:

  1. Create a Tag Template:

    • Go to the Dataplex Console.

    • Navigate to Metadata Management > Tag Templates.

    • Click "Create Tag Template" and provide the necessary information like name and fields.

    • Define the fields that will be used as tags, including name, type, and description.

    • Click "Create".

  2. Attach Tag Template to BigQuery Tables:

    • Navigate to Metadata Management > Tags.

    • Click "Create Tag" and choose the Tag Template you created.

    • Select the resource type "BigQuery Table" and provide the table details.

    • Fill in the tag fields with the desired values.

    • Click "Create" to attach the tag to the table.

  3. Verify the Tags:

    • Check the Tags section in the Dataplex console and verify the tags attached to your BigQuery table.

Pros of Using Tag Templates:

  • Easy to apply and manage within both Dataplex and BigQuery.

  • Provides a structured way to handle metadata tagging.

Cons:

  • Slightly less flexible for complex governance needs compared to Attribute Store.

Example of Each Method

Using Attribute Store Example

  1. Create a Taxonomy:

    • Name: Data Sensitivity

    • Description: Tags for classifying data sensitivity levels

  2. Create Attributes:

    • Name: PII

    • Description: Personally Identifiable Information

  3. Attach Attribute:

    • Table: my_dataset.customer_data

    • Attribute: Data Sensitivity -> PII

Using Tag Templates Example

  1. Create Tag Template:

    • Name: Data Classification

    • Fields:

      • confidentiality_level (STRING): Confidentiality level of the data

  2. Attach Tag Template:

    • Table: my_dataset.customer_data

    • Tag: Data Classification

    • Field Value: confidentiality_level = "High"

Additional Tips and Best Practices

  1. Choosing Between Methods:

    • Attribute Store: Use when you need centralized governance across multiple systems and complex data governance rules.

    • Tag Templates: Use for straightforward tagging within BigQuery and Dataplex, especially if you require integration with BigQuery metadata.

  2. Lifecycle Management:

    • If you need to implement lifecycle policies based on tags, Attribute Store might offer more flexibility.

  3. Automation:

    • Consider automating the synchronization of tags between Dataplex and BigQuery using custom scripts if using Attribute Store.

  4. Consistency:

    • Ensure that tags and attributes are consistently defined and applied across your datasets for effective governance.