Hi,
Is there a way we can enforce lifecycle policies on the basis of Security classification "policy tags" that are implied on BigQuery tables using Dataplex or by any other method?
Also how to impose data lifecycle policies on dataset level using tags?
Solved! Go to Solution.
Yes, you can enforce lifecycle policies in BigQuery based on security classifications ("policy tags") applied to tables. Here is how to do this using a combination of BigQuery's native capabilities and custom logic:
Enforcing Lifecycle Policies with Policy Tags
Apply Policy Tags
from google.cloud import bigquery
client = bigquery.Client()
# Example: Apply a policy tag to a BigQuery table
table_id = "your-project.your_dataset.your_table"
table = client.get_table(table_id) # Make an API request.
table.labels = {"security_level": "high", "lifecycle_policy": "delete_after_30_days"}
table = client.update_table(table, ["labels"]) # Make an API request.
Define Lifecycle Policies in BigQuery
# Set table expiration time
table.expires = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=30)
table = client.update_table(table, ["expires"]) # Make an API request.
# Set dataset default table expiration policy
dataset_id = "your-project.your_dataset"
dataset = client.get_dataset(dataset_id) # Make an API request.
dataset.default_table_expiration_ms = 30 * 24 * 60 * 60 * 1000 # 30 days in milliseconds
dataset = client.update_dataset(dataset, ["default_table_expiration_ms"]) # Make an API request.
Automate Enforcement with Custom Logic
from google.cloud import bigquery
def enforce_lifecycle_policy(request):
client = bigquery.Client()
query = """
SELECT table_catalog, table_schema, table_name
FROM `your-project.your_dataset.INFORMATION_SCHEMA.TABLES`
WHERE labels.lifecycle_policy = 'delete_after_30_days'
"""
results = client.query(query).result()
for row in results:
table_id = f"{row.table_catalog}.{row.table_schema}.{row.table_name}"
table = client.get_table(table_id)
table.expires = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=30)
client.update_table(table, ["expires"])
print(f"Updated expiration for table {table_id}")
Imposing Data Lifecycle Policies at the Dataset Level
Use Dataset Labels
dataset.labels = {"lifecycle_policy": "delete_after_30_days"}
dataset = client.update_dataset(dataset, ["labels"]) # Make an API request.
Define Lifecycle Rules
Automate Policy Enforcement
from google.cloud import bigquery
def enforce_lifecycle_policy(request):
client = bigquery.Client()
datasets = list(client.list_datasets())
for dataset in datasets:
if dataset.labels.get("lifecycle_policy") == "delete_after_30_days":
print(f"Deleting dataset: {dataset.dataset_id}")
client.delete_dataset(dataset.dataset_id, delete_contents=True, not_found_ok=True)
gcloud functions deploy enforce_lifecycle_policy --runtime python39 --trigger-http
In BigQuery, you can use both tags and labels to provide security classifications at the table level, but they serve slightly different purposes. Here’s how you can use them and their implications for lifecycle policies:
Tags vs. Labels
Adding Security Classifications Using Labels
To add a security classification such as "Business Confidential" at the table level using labels:
from google.cloud import bigquery
client = bigquery.Client()
# Define your table ID
table_id = "your-project.your_dataset.your_table"
# Fetch the table
table = client.get_table(table_id)
# Set the label
table.labels = {"security_classification": "Business Confidential"}
# Update the table with the new label
table = client.update_table(table, ["labels"])
print(f"Updated table {table_id} with labels {table.labels}")
Using Labels for Lifecycle Policies
You can use the labels to enforce lifecycle policies by developing custom scripts or Cloud Functions that read these labels and apply the corresponding lifecycle actions.
Adding Security Classifications Using Policy Tags
To add a security classification using policy tags in Dataplex:
from google.cloud import datacatalog_v1
# Initialize the Data Catalog client
datacatalog_client = datacatalog_v1.PolicyTagManagerClient()
# Define your table resource
table_id = "projects/your-project/locations/us-central1/taxonomies/your-taxonomy/policyTags/your-policy-tag"
# Attach the policy tag to the BigQuery table
entry = datacatalog_client.lookup_entry(
request={"linked_resource": "//bigquery.googleapis.com/projects/your-project/datasets/your_dataset/tables/your_table"}
)
policy_tag = datacatalog_v1.types.PolicyTagAssociation(policy_tag=table_id)
entry.policy_tag_associations.append(policy_tag)
# Update the entry
datacatalog_client.update_entry(entry=entry)
print(f"Added policy tag {table_id} to table {entry.name}")
Using Policy Tags for Lifecycle Policies
from google.cloud import bigquery, datacatalog_v1
def enforce_lifecycle_policy(request):
bigquery_client = bigquery.Client()
datacatalog_client = datacatalog_v1.PolicyTagManagerClient()
# ... rest of the code (see full example above)
Yes, you can enforce lifecycle policies in BigQuery based on security classifications ("policy tags") applied to tables. Here is how to do this using a combination of BigQuery's native capabilities and custom logic:
Enforcing Lifecycle Policies with Policy Tags
Apply Policy Tags
from google.cloud import bigquery
client = bigquery.Client()
# Example: Apply a policy tag to a BigQuery table
table_id = "your-project.your_dataset.your_table"
table = client.get_table(table_id) # Make an API request.
table.labels = {"security_level": "high", "lifecycle_policy": "delete_after_30_days"}
table = client.update_table(table, ["labels"]) # Make an API request.
Define Lifecycle Policies in BigQuery
# Set table expiration time
table.expires = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=30)
table = client.update_table(table, ["expires"]) # Make an API request.
# Set dataset default table expiration policy
dataset_id = "your-project.your_dataset"
dataset = client.get_dataset(dataset_id) # Make an API request.
dataset.default_table_expiration_ms = 30 * 24 * 60 * 60 * 1000 # 30 days in milliseconds
dataset = client.update_dataset(dataset, ["default_table_expiration_ms"]) # Make an API request.
Automate Enforcement with Custom Logic
from google.cloud import bigquery
def enforce_lifecycle_policy(request):
client = bigquery.Client()
query = """
SELECT table_catalog, table_schema, table_name
FROM `your-project.your_dataset.INFORMATION_SCHEMA.TABLES`
WHERE labels.lifecycle_policy = 'delete_after_30_days'
"""
results = client.query(query).result()
for row in results:
table_id = f"{row.table_catalog}.{row.table_schema}.{row.table_name}"
table = client.get_table(table_id)
table.expires = datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=30)
client.update_table(table, ["expires"])
print(f"Updated expiration for table {table_id}")
Imposing Data Lifecycle Policies at the Dataset Level
Use Dataset Labels
dataset.labels = {"lifecycle_policy": "delete_after_30_days"}
dataset = client.update_dataset(dataset, ["labels"]) # Make an API request.
Define Lifecycle Rules
Automate Policy Enforcement
from google.cloud import bigquery
def enforce_lifecycle_policy(request):
client = bigquery.Client()
datasets = list(client.list_datasets())
for dataset in datasets:
if dataset.labels.get("lifecycle_policy") == "delete_after_30_days":
print(f"Deleting dataset: {dataset.dataset_id}")
client.delete_dataset(dataset.dataset_id, delete_contents=True, not_found_ok=True)
gcloud functions deploy enforce_lifecycle_policy --runtime python39 --trigger-http
Should we add tags or labels to provide security classification tags at the table level?
Means let's say if we want to add security classification like "Business Confidential" at the table level. Should we use tags or labels? Also we can use the above solution to add lifecycle policies to the table by attaching the policies to the security classification right?
In BigQuery, you can use both tags and labels to provide security classifications at the table level, but they serve slightly different purposes. Here’s how you can use them and their implications for lifecycle policies:
Tags vs. Labels
Adding Security Classifications Using Labels
To add a security classification such as "Business Confidential" at the table level using labels:
from google.cloud import bigquery
client = bigquery.Client()
# Define your table ID
table_id = "your-project.your_dataset.your_table"
# Fetch the table
table = client.get_table(table_id)
# Set the label
table.labels = {"security_classification": "Business Confidential"}
# Update the table with the new label
table = client.update_table(table, ["labels"])
print(f"Updated table {table_id} with labels {table.labels}")
Using Labels for Lifecycle Policies
You can use the labels to enforce lifecycle policies by developing custom scripts or Cloud Functions that read these labels and apply the corresponding lifecycle actions.
Adding Security Classifications Using Policy Tags
To add a security classification using policy tags in Dataplex:
from google.cloud import datacatalog_v1
# Initialize the Data Catalog client
datacatalog_client = datacatalog_v1.PolicyTagManagerClient()
# Define your table resource
table_id = "projects/your-project/locations/us-central1/taxonomies/your-taxonomy/policyTags/your-policy-tag"
# Attach the policy tag to the BigQuery table
entry = datacatalog_client.lookup_entry(
request={"linked_resource": "//bigquery.googleapis.com/projects/your-project/datasets/your_dataset/tables/your_table"}
)
policy_tag = datacatalog_v1.types.PolicyTagAssociation(policy_tag=table_id)
entry.policy_tag_associations.append(policy_tag)
# Update the entry
datacatalog_client.update_entry(entry=entry)
print(f"Added policy tag {table_id} to table {entry.name}")
Using Policy Tags for Lifecycle Policies
from google.cloud import bigquery, datacatalog_v1
def enforce_lifecycle_policy(request):
bigquery_client = bigquery.Client()
datacatalog_client = datacatalog_v1.PolicyTagManagerClient()
# ... rest of the code (see full example above)