Policy tags in Bigquery

There is case where we mask data in one column in a table and we don't mask the same column in other table.

so whenever we are joining the tables, Bigquery is comparing the masked and unmasked data so we are not getting any results.Is there any other way where Bigquery should consider the underlying data for joining  and give results accordingly.

0 3 1,767
3 REPLIES 3

To join two tables in BigQuery where one column has masked data and the other doesn't, you'll face challenges because the masked values won't match the unmasked values. Here are some potential approaches:

  • Mask both columns before joining. This ensures that both columns have consistent masked data for the join. This approach is feasible if the masking function is reversible.
  • Unmask the data in the masked table before joining. This approach is not feasible if the masking function is irreversible.
  • Use a common key that is not masked in both tables for joining. This key should be unique and available in both tables.
  • Create temporary tables where you either mask or unmask the data as needed, then perform the join on these temporary tables. This approach can be used if the masking function is not reversible.

It is important to always consider data privacy implications before making changes to how data is processed. Joining two tables where one column has masked data and the other column does not can potentially violate data privacy if the masking function is not reversible. Therefore, it is important to choose an approach that will protect the privacy of the data.

But we have a below scenario where the process it getting complex....
At the beginning we have not used policy tags feature.But now we want to use this feature to improve data security(where we mask PII data).Now we have a case where table_1 has user_id and phone_number as columns and other table table_2 has phone_number in a json type column.We have applied masking rule to table_1 and we have not applied to table_2 as it is JSON field.Queries are written by joining phone_numbers from table_1 & 2.So we don't get any results as one is masked and the other is unmasked.So according to suggestion you have given we have to modify all queries.So is there any other better solution for this.

Also we are not knowing how to mask PII data in JSON column.Please suggest what industry standards are being followed for data security.

There are a number of different ways to join masked and unmasked data in BigQuery. The best approach for you will depend on your specific needs and requirements.

Data Catalog Policy Tag

Data catalog policy tags in Google Cloud are used for column-level access control and do not have the capability to unmask data. They are used to enforce fine-grained access control on the columns by defining policies on who can access the data in a particular column, but they do not affect the underlying data itself. Therefore, granting the Data Catalog Fine-Grained Reader role on the policy tag would not allow users to access the unmasked data in a masked column.

Custom Join Function

Writing a custom join function could be a viable approach, but it would not work in the case of masked phone numbers if the masking is irreversible (e.g., if the phone numbers are fully masked or if the masking function is not deterministic). However, if the masking is partial and reversible (e.g., if only the last four digits are masked or if the masking function is deterministic), then this approach could work.

For example, you could write a custom join function that compares the country code and area code of the phone numbers. This would work even if the last four digits of the phone numbers are masked.

Temporary Tables

This approach is correctly identified as a potential solution if the masking function is not reversible. It is the most straightforward method but may require changes to existing queries and additional storage and processing costs.

Overall

It is important to understand the nature of the masking function being used and whether it is reversible or not. This will greatly affect the potential solutions available. It is also crucial to be aware of the data privacy implications and to ensure compliance with all relevant regulations and policies when modifying the handling of sensitive data.