There is case where we mask data in one column in a table and we don't mask the same column in other table.
so whenever we are joining the tables, Bigquery is comparing the masked and unmasked data so we are not getting any results.Is there any other way where Bigquery should consider the underlying data for joining and give results accordingly.
To join two tables in BigQuery where one column has masked data and the other doesn't, you'll face challenges because the masked values won't match the unmasked values. Here are some potential approaches:
It is important to always consider data privacy implications before making changes to how data is processed. Joining two tables where one column has masked data and the other column does not can potentially violate data privacy if the masking function is not reversible. Therefore, it is important to choose an approach that will protect the privacy of the data.
But we have a below scenario where the process it getting complex....
At the beginning we have not used policy tags feature.But now we want to use this feature to improve data security(where we mask PII data).Now we have a case where table_1 has user_id and phone_number as columns and other table table_2 has phone_number in a json type column.We have applied masking rule to table_1 and we have not applied to table_2 as it is JSON field.Queries are written by joining phone_numbers from table_1 & 2.So we don't get any results as one is masked and the other is unmasked.So according to suggestion you have given we have to modify all queries.So is there any other better solution for this.
Also we are not knowing how to mask PII data in JSON column.Please suggest what industry standards are being followed for data security.
There are a number of different ways to join masked and unmasked data in BigQuery. The best approach for you will depend on your specific needs and requirements.
Data Catalog Policy Tag
Data catalog policy tags in Google Cloud are used for column-level access control and do not have the capability to unmask data. They are used to enforce fine-grained access control on the columns by defining policies on who can access the data in a particular column, but they do not affect the underlying data itself. Therefore, granting the Data Catalog Fine-Grained Reader role on the policy tag would not allow users to access the unmasked data in a masked column.
Custom Join Function
Writing a custom join function could be a viable approach, but it would not work in the case of masked phone numbers if the masking is irreversible (e.g., if the phone numbers are fully masked or if the masking function is not deterministic). However, if the masking is partial and reversible (e.g., if only the last four digits are masked or if the masking function is deterministic), then this approach could work.
For example, you could write a custom join function that compares the country code and area code of the phone numbers. This would work even if the last four digits of the phone numbers are masked.
Temporary Tables
This approach is correctly identified as a potential solution if the masking function is not reversible. It is the most straightforward method but may require changes to existing queries and additional storage and processing costs.
Overall
It is important to understand the nature of the masking function being used and whether it is reversible or not. This will greatly affect the potential solutions available. It is also crucial to be aware of the data privacy implications and to ensure compliance with all relevant regulations and policies when modifying the handling of sensitive data.