Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Data Lifecycle Policies on Policy tags(Security Classification)

Hi,

Is there a way we can enforce lifecycle policies on the basis of Security classification "policy tags" that are implied on BigQuery tables using Dataplex or by any other method?

Also how to impose data lifecycle policies on dataset level using tags?

Solved Solved
0 3 963
2 ACCEPTED SOLUTIONS

Yes, you can enforce lifecycle policies in BigQuery based on security classifications ("policy tags") applied to tables. Here is how to do this using a combination of BigQuery's native capabilities and custom logic:

Enforcing Lifecycle Policies with Policy Tags

  1. Apply Policy Tags

    • Use Dataplex or BigQuery Directly:
      • Dataplex: Utilize Dataplex to create and manage policy tags that classify your BigQuery tables.
      • BigQuery: You can also create and apply policy tags directly in BigQuery.
     
    from google.cloud import bigquery
    client = bigquery.Client()
    
    # Example: Apply a policy tag to a BigQuery table
    table_id = "your-project.your_dataset.your_table"
    table = client.get_table(table_id)  # Make an API request.
    table.labels = {"security_level": "high", "lifecycle_policy": "delete_after_30_days"}
    table = client.update_table(table, ["labels"])  # Make an API request.
    
     
     
     
  2. Define Lifecycle Policies in BigQuery

    • Utilize BigQuery’s Native Lifecycle Management Features:
      • Table Level: Set expiration times (time-to-live) for tables or specific partitions.
      • Dataset Level: Define default table expiration policies for all tables within a dataset.
     
    # Set table expiration time
    table.expires = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=30)
    table = client.update_table(table, ["expires"])  # Make an API request.
    
    # Set dataset default table expiration policy
    dataset_id = "your-project.your_dataset"
    dataset = client.get_dataset(dataset_id)  # Make an API request.
    dataset.default_table_expiration_ms = 30 * 24 * 60 * 60 * 1000  # 30 days in milliseconds
    dataset = client.update_dataset(dataset, ["default_table_expiration_ms"])  # Make an API request.
    
     
     
     
  3. Automate Enforcement with Custom Logic

    • Develop Scripts or Cloud Functions:
      • Read Policy Tags: Query BigQuery's INFORMATION_SCHEMA to identify tables with specific policy tags.
      • Apply Lifecycle Actions: Based on the identified tags, execute BigQuery DDL statements to alter table/dataset properties (e.g., update expiration times).
     
    from google.cloud import bigquery
    
    def enforce_lifecycle_policy(request):
        client = bigquery.Client()
    
        query = """
        SELECT table_catalog, table_schema, table_name
        FROM `your-project.your_dataset.INFORMATION_SCHEMA.TABLES`
        WHERE labels.lifecycle_policy = 'delete_after_30_days'
        """
        results = client.query(query).result()
    
        for row in results:
            table_id = f"{row.table_catalog}.{row.table_schema}.{row.table_name}"
            table = client.get_table(table_id)
            table.expires = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=30)
            client.update_table(table, ["expires"])
            print(f"Updated expiration for table {table_id}")
    
     
     
     

Imposing Data Lifecycle Policies at the Dataset Level

  1. Use Dataset Labels

    • Apply Labels to Datasets:
     
    dataset.labels = {"lifecycle_policy": "delete_after_30_days"}
    dataset = client.update_dataset(dataset, ["labels"])  # Make an API request.
    
     
     
     
  2. Define Lifecycle Rules

    • Create Rules Based on Dataset Labels:
      • Define rules specifying actions like moving data to different storage classes, archiving, or deleting data after a specific period.
  3. Automate Policy Enforcement

    • Use Cloud Functions or Cloud Scheduler:
     
    from google.cloud import bigquery
    
    def enforce_lifecycle_policy(request):
        client = bigquery.Client()
        datasets = list(client.list_datasets())
    
        for dataset in datasets:
            if dataset.labels.get("lifecycle_policy") == "delete_after_30_days":
                print(f"Deleting dataset: {dataset.dataset_id}")
                client.delete_dataset(dataset.dataset_id, delete_contents=True, not_found_ok=True)
    
     
     
     
    • Deploy the Cloud Function:
     
    gcloud functions deploy enforce_lifecycle_policy --runtime python39 --trigger-http
    
    • Schedule the Function: Use Cloud Scheduler to run this function at regular intervals.

View solution in original post

In BigQuery, you can use both tags and labels to provide security classifications at the table level, but they serve slightly different purposes. Here’s how you can use them and their implications for lifecycle policies:

Tags vs. Labels

  • Labels:
    • Purpose: Labels are key-value pairs that you can attach to various resources like tables, datasets, and projects. They are mainly used for organizing, grouping, and filtering resources for cost management, billing, and automation.
    • Use Case: For adding security classifications like "Business Confidential," labels are appropriate as they allow you to categorize and manage your tables easily.
  • Tags (Policy Tags):
    • Purpose: Policy tags are part of Google Cloud's Data Catalog and Dataplex. They are used for fine-grained access control based on data classification and are integrated with Data Catalog’s policy tag manager.
    • Use Case: If your primary goal is to control access and enforce security policies based on data classification, policy tags are more suitable.

Adding Security Classifications Using Labels

To add a security classification such as "Business Confidential" at the table level using labels:

 
from google.cloud import bigquery
client = bigquery.Client()

# Define your table ID
table_id = "your-project.your_dataset.your_table"

# Fetch the table
table = client.get_table(table_id)

# Set the label
table.labels = {"security_classification": "Business Confidential"}

# Update the table with the new label
table = client.update_table(table, ["labels"])

print(f"Updated table {table_id} with labels {table.labels}")

Using Labels for Lifecycle Policies

You can use the labels to enforce lifecycle policies by developing custom scripts or Cloud Functions that read these labels and apply the corresponding lifecycle actions.

Adding Security Classifications Using Policy Tags

To add a security classification using policy tags in Dataplex:

  1. Create Policy Tags in Dataplex:
    • Go to the Dataplex console.
    • Navigate to "Policy Tags" and create a tag named "Business Confidential."
  2. Attach Policy Tags to Tables:
 
from google.cloud import datacatalog_v1

# Initialize the Data Catalog client
datacatalog_client = datacatalog_v1.PolicyTagManagerClient()

# Define your table resource
table_id = "projects/your-project/locations/us-central1/taxonomies/your-taxonomy/policyTags/your-policy-tag"

# Attach the policy tag to the BigQuery table
entry = datacatalog_client.lookup_entry(
    request={"linked_resource": "//bigquery.googleapis.com/projects/your-project/datasets/your_dataset/tables/your_table"}
)
policy_tag = datacatalog_v1.types.PolicyTagAssociation(policy_tag=table_id)
entry.policy_tag_associations.append(policy_tag)

# Update the entry
datacatalog_client.update_entry(entry=entry)
print(f"Added policy tag {table_id} to table {entry.name}")

Using Policy Tags for Lifecycle Policies

  • Develop Custom Logic: Write a Cloud Function or script that queries the Data Catalog for tables with specific policy tags and applies lifecycle policies accordingly.
 
from google.cloud import bigquery, datacatalog_v1

def enforce_lifecycle_policy(request):
    bigquery_client = bigquery.Client()
    datacatalog_client = datacatalog_v1.PolicyTagManagerClient()

    # ... rest of the code (see full example above)
  • Labels: Suitable for simple categorization and management tasks, including lifecycle policies based on classifications.
  • Policy Tags: Ideal for enforcing fine-grained access control and security policies. They can also be used for lifecycle management with custom logic.

View solution in original post

3 REPLIES 3

Yes, you can enforce lifecycle policies in BigQuery based on security classifications ("policy tags") applied to tables. Here is how to do this using a combination of BigQuery's native capabilities and custom logic:

Enforcing Lifecycle Policies with Policy Tags

  1. Apply Policy Tags

    • Use Dataplex or BigQuery Directly:
      • Dataplex: Utilize Dataplex to create and manage policy tags that classify your BigQuery tables.
      • BigQuery: You can also create and apply policy tags directly in BigQuery.
     
    from google.cloud import bigquery
    client = bigquery.Client()
    
    # Example: Apply a policy tag to a BigQuery table
    table_id = "your-project.your_dataset.your_table"
    table = client.get_table(table_id)  # Make an API request.
    table.labels = {"security_level": "high", "lifecycle_policy": "delete_after_30_days"}
    table = client.update_table(table, ["labels"])  # Make an API request.
    
     
     
     
  2. Define Lifecycle Policies in BigQuery

    • Utilize BigQuery’s Native Lifecycle Management Features:
      • Table Level: Set expiration times (time-to-live) for tables or specific partitions.
      • Dataset Level: Define default table expiration policies for all tables within a dataset.
     
    # Set table expiration time
    table.expires = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=30)
    table = client.update_table(table, ["expires"])  # Make an API request.
    
    # Set dataset default table expiration policy
    dataset_id = "your-project.your_dataset"
    dataset = client.get_dataset(dataset_id)  # Make an API request.
    dataset.default_table_expiration_ms = 30 * 24 * 60 * 60 * 1000  # 30 days in milliseconds
    dataset = client.update_dataset(dataset, ["default_table_expiration_ms"])  # Make an API request.
    
     
     
     
  3. Automate Enforcement with Custom Logic

    • Develop Scripts or Cloud Functions:
      • Read Policy Tags: Query BigQuery's INFORMATION_SCHEMA to identify tables with specific policy tags.
      • Apply Lifecycle Actions: Based on the identified tags, execute BigQuery DDL statements to alter table/dataset properties (e.g., update expiration times).
     
    from google.cloud import bigquery
    
    def enforce_lifecycle_policy(request):
        client = bigquery.Client()
    
        query = """
        SELECT table_catalog, table_schema, table_name
        FROM `your-project.your_dataset.INFORMATION_SCHEMA.TABLES`
        WHERE labels.lifecycle_policy = 'delete_after_30_days'
        """
        results = client.query(query).result()
    
        for row in results:
            table_id = f"{row.table_catalog}.{row.table_schema}.{row.table_name}"
            table = client.get_table(table_id)
            table.expires = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=30)
            client.update_table(table, ["expires"])
            print(f"Updated expiration for table {table_id}")
    
     
     
     

Imposing Data Lifecycle Policies at the Dataset Level

  1. Use Dataset Labels

    • Apply Labels to Datasets:
     
    dataset.labels = {"lifecycle_policy": "delete_after_30_days"}
    dataset = client.update_dataset(dataset, ["labels"])  # Make an API request.
    
     
     
     
  2. Define Lifecycle Rules

    • Create Rules Based on Dataset Labels:
      • Define rules specifying actions like moving data to different storage classes, archiving, or deleting data after a specific period.
  3. Automate Policy Enforcement

    • Use Cloud Functions or Cloud Scheduler:
     
    from google.cloud import bigquery
    
    def enforce_lifecycle_policy(request):
        client = bigquery.Client()
        datasets = list(client.list_datasets())
    
        for dataset in datasets:
            if dataset.labels.get("lifecycle_policy") == "delete_after_30_days":
                print(f"Deleting dataset: {dataset.dataset_id}")
                client.delete_dataset(dataset.dataset_id, delete_contents=True, not_found_ok=True)
    
     
     
     
    • Deploy the Cloud Function:
     
    gcloud functions deploy enforce_lifecycle_policy --runtime python39 --trigger-http
    
    • Schedule the Function: Use Cloud Scheduler to run this function at regular intervals.

Should we add tags or labels to provide security classification tags at the table level?

Means let's say if we want to add security classification like "Business Confidential" at the table level. Should we use tags or labels? Also we can use the above solution to add lifecycle policies to the table by attaching the policies to the security classification right?

In BigQuery, you can use both tags and labels to provide security classifications at the table level, but they serve slightly different purposes. Here’s how you can use them and their implications for lifecycle policies:

Tags vs. Labels

  • Labels:
    • Purpose: Labels are key-value pairs that you can attach to various resources like tables, datasets, and projects. They are mainly used for organizing, grouping, and filtering resources for cost management, billing, and automation.
    • Use Case: For adding security classifications like "Business Confidential," labels are appropriate as they allow you to categorize and manage your tables easily.
  • Tags (Policy Tags):
    • Purpose: Policy tags are part of Google Cloud's Data Catalog and Dataplex. They are used for fine-grained access control based on data classification and are integrated with Data Catalog’s policy tag manager.
    • Use Case: If your primary goal is to control access and enforce security policies based on data classification, policy tags are more suitable.

Adding Security Classifications Using Labels

To add a security classification such as "Business Confidential" at the table level using labels:

 
from google.cloud import bigquery
client = bigquery.Client()

# Define your table ID
table_id = "your-project.your_dataset.your_table"

# Fetch the table
table = client.get_table(table_id)

# Set the label
table.labels = {"security_classification": "Business Confidential"}

# Update the table with the new label
table = client.update_table(table, ["labels"])

print(f"Updated table {table_id} with labels {table.labels}")

Using Labels for Lifecycle Policies

You can use the labels to enforce lifecycle policies by developing custom scripts or Cloud Functions that read these labels and apply the corresponding lifecycle actions.

Adding Security Classifications Using Policy Tags

To add a security classification using policy tags in Dataplex:

  1. Create Policy Tags in Dataplex:
    • Go to the Dataplex console.
    • Navigate to "Policy Tags" and create a tag named "Business Confidential."
  2. Attach Policy Tags to Tables:
 
from google.cloud import datacatalog_v1

# Initialize the Data Catalog client
datacatalog_client = datacatalog_v1.PolicyTagManagerClient()

# Define your table resource
table_id = "projects/your-project/locations/us-central1/taxonomies/your-taxonomy/policyTags/your-policy-tag"

# Attach the policy tag to the BigQuery table
entry = datacatalog_client.lookup_entry(
    request={"linked_resource": "//bigquery.googleapis.com/projects/your-project/datasets/your_dataset/tables/your_table"}
)
policy_tag = datacatalog_v1.types.PolicyTagAssociation(policy_tag=table_id)
entry.policy_tag_associations.append(policy_tag)

# Update the entry
datacatalog_client.update_entry(entry=entry)
print(f"Added policy tag {table_id} to table {entry.name}")

Using Policy Tags for Lifecycle Policies

  • Develop Custom Logic: Write a Cloud Function or script that queries the Data Catalog for tables with specific policy tags and applies lifecycle policies accordingly.
 
from google.cloud import bigquery, datacatalog_v1

def enforce_lifecycle_policy(request):
    bigquery_client = bigquery.Client()
    datacatalog_client = datacatalog_v1.PolicyTagManagerClient()

    # ... rest of the code (see full example above)
  • Labels: Suitable for simple categorization and management tasks, including lifecycle policies based on classifications.
  • Policy Tags: Ideal for enforcing fine-grained access control and security policies. They can also be used for lifecycle management with custom logic.