Hi,
How can we build a cloud function to read all entries from Data Loss Prevention data profile scans and accordingly tag PII columns with policy tags using data taxonomy?
Do we need to use DLP API for that?
Solved! Go to Solution.
Hi @Vishesh1998 ,
Automating the addition of policy tags to BigQuery tables based on DLP data profile scans is a great approach to ensuring security and compliance. You can indeed build a Cloud Function to handle this process. Here's how you can do it:
Overall Workflow
Here is a sample code snippet to get you started:
from google.cloud import bigquery
from google.cloud import dlp_v2
def tag_pii_columns(event, context):
# 1. Get scan details from Pub/Sub message (event)
scan_name = event['attributes']['DlpJobName']
# 2. Initialize DLP and BigQuery clients
dlp_client = dlp_v2.DlpServiceClient()
bq_client = bigquery.Client()
# 3. Fetch scan results from DLP
scan_results = dlp_client.get_dlp_job(name=scan_name)
inspect_details = scan_results.inspect_details
# 4. Parse findings and apply policy tags
for finding in inspect_details.result.info_type_stats:
info_type_name = finding.info_type.name # e.g., "US_SOCIAL_SECURITY_NUMBER"
column_name = finding.field_name # e.g., "ssn"
# Map info_type_name to policy tag (replace with your own mapping)
policy_tag = map_info_type_to_tag(info_type_name)
# Apply policy tag to BigQuery column (replace with your project ID and dataset)
table_ref = bq_client.dataset("your_dataset").table("your_table")
table = bq_client.get_table(table_ref)
for field in table.schema:
if field.name == column_name:
field.policy_tags = bigquery.PolicyTagList([policy_tag])
bq_client.update_table(table, ["schema"]) # Update table schema
# Helper function to map info_type_name to policy tag (replace with your mapping)
def map_info_type_to_tag(info_type_name):
# Your custom mapping logic here
pass
Key Considerations
Need for DLP API
Yes, you will need to use the DLP API to retrieve the results of the data profile scans. The Cloud Function will interact with this API to fetch the detailed findings that guide the application of policy tags.
Hi @Vishesh1998 ,
Automating the addition of policy tags to BigQuery tables based on DLP data profile scans is a great approach to ensuring security and compliance. You can indeed build a Cloud Function to handle this process. Here's how you can do it:
Overall Workflow
Here is a sample code snippet to get you started:
from google.cloud import bigquery
from google.cloud import dlp_v2
def tag_pii_columns(event, context):
# 1. Get scan details from Pub/Sub message (event)
scan_name = event['attributes']['DlpJobName']
# 2. Initialize DLP and BigQuery clients
dlp_client = dlp_v2.DlpServiceClient()
bq_client = bigquery.Client()
# 3. Fetch scan results from DLP
scan_results = dlp_client.get_dlp_job(name=scan_name)
inspect_details = scan_results.inspect_details
# 4. Parse findings and apply policy tags
for finding in inspect_details.result.info_type_stats:
info_type_name = finding.info_type.name # e.g., "US_SOCIAL_SECURITY_NUMBER"
column_name = finding.field_name # e.g., "ssn"
# Map info_type_name to policy tag (replace with your own mapping)
policy_tag = map_info_type_to_tag(info_type_name)
# Apply policy tag to BigQuery column (replace with your project ID and dataset)
table_ref = bq_client.dataset("your_dataset").table("your_table")
table = bq_client.get_table(table_ref)
for field in table.schema:
if field.name == column_name:
field.policy_tags = bigquery.PolicyTagList([policy_tag])
bq_client.update_table(table, ["schema"]) # Update table schema
# Helper function to map info_type_name to policy tag (replace with your mapping)
def map_info_type_to_tag(info_type_name):
# Your custom mapping logic here
pass
Key Considerations
Need for DLP API
Yes, you will need to use the DLP API to retrieve the results of the data profile scans. The Cloud Function will interact with this API to fetch the detailed findings that guide the application of policy tags.