Hi im trying to set up a custom query in Dataplex Data Quality Scans like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | { "data": { "resource": "my-resource" }, "data_quality_spec": { |
My question is the following, how this partial sql_expression is actually converted into a full query? Is translated as a subquery ?
Im asking because I want to create a sql_expression that checks the number of rows of the row_filter partition against a previous partition, and im not sure how dataplex is going to interpret it.
EDIT: link to Table condition expectation:
In Dataplex Data Quality enables you to define custom SQL expressions for validating data quality against specific conditions within a table. These expressions are evaluated within the context of a specified row filter, ensuring that the validation applies only to the relevant subset of data.
Contextual Evaluation and Expression Design:
The sql_expression
should be contextually relevant to the data subset defined by the row_filter
.
Design the sql_expression
to accurately assess data quality based on the specified conditions.
Use appropriate aggregate functions, joins, or other SQL features as needed for your data quality checks.
Result Expectation and Flexibility:
The sql_expression
should return a boolean result (TRUE or FALSE) indicating whether the data meets the specified quality criteria.
TableConditionExpectation offers flexibility for a wide range of data quality checks, from simple validations to complex comparisons.
Testing, Documentation, and Community Resources:
Thoroughly test your sql_expression
in a separate SQL environment before using it in Dataplex.
Consult community forums, user guides, and other resources for additional insights and practical examples.
Leveraging TableConditionExpectation Effectively:
Carefully plan and test your custom SQL expressions to ensure they produce accurate and reliable results.
Tailor your expressions to your specific data characteristics and quality requirements.
By considering these aspects, you can effectively utilize TableConditionExpectation in Dataplex Data Quality to perform robust and customized data quality validations.
Can be used dataplex quality data scan to test if a given value, for example an uuid, is in the uuid colonne of another table ?
You can leverage Dataplex Data Quality scans to verify the presence of a specific UUID within the UUID column of another table. Here's how to approach this:
Method: Custom SQL Expression within Data Quality Rule
EXISTS (SELECT 1 FROM `other_project.other_dataset.other_table` WHERE uuid_column = 'your_uuid_value')
other_project.other_dataset.other_table
with the actual path to the table you're checking against.'your_uuid_value'
with the specific UUID you're verifying.Considerations for Rule Configuration:
Operational Considerations:
uuid_column
in the target table can improve the efficiency of the query.Additional Considerations: