Column Parameter to Dataform Assertion

Hi, 

Is it possible create a custom assertion passing column and list of accepted values as parameter to check values to column?

DBT has a test called accepted_values that check if column match the only values got.

Is it possible customize a assertion that do the same check?

Solved Solved
0 4 493
1 ACCEPTED SOLUTION

Dataform does not have a directly equivalent "accepted_values" test as found in DBT. However, you can achieve a similar behavior and create custom assertions using these approaches:

1. Using rowConditions in table configurations:

If the table or view you want to check is defined within your Dataform code (dataform.json or .sqlx files), you can use rowConditions assertions. This allows you to write SQL expressions that should evaluate to true for all rows.

 
config {
   type: "table",
   assertions: {
     rowConditions: [
       "`<your_column>` IN ('value1', 'value2', 'value3')" // Replace with actual column name and accepted values
     ]
   }
}

SELECT 
    ... -- your table creation logic
  • If any row in the table has a value for <your_column> that is not in the specified list, the assertion will fail.

2. Creating Manual Assertions:

For more flexibility or to test tables/views not directly defined in Dataform, you can create manual assertions.

a. Using Dataform SQLX files:

 
-- File: custom_assertions.sqlx

config {
   type: "assertion",  
   assertionName: "check_column_accepted_values"  
} 

SELECT * 
FROM `<your_dataset>.<your_table>`
WHERE `<your_column>` NOT IN ('value1', 'value2', 'value3'); 
  • This assertion query is designed to return rows that violate the condition (values not in the accepted list). If the query returns anyrows, the assertion fails.

b. Using JavaScript (Dataform Core):

JavaScript
// In your JavaScript file
assert("check_column_accepted_values")
  .query(`
    SELECT * 
    FROM \`<span class="math-inline">\{dataset\}\.</span>{table}\`
    WHERE \`${column}\` NOT IN ('value1', 'value2', 'value3')
  `);

Important Notes:

  • Replace placeholders:

    • <your_dataset>, <your_table>, and <your_column> need to be replaced with the actual names.
    • Update the list ('value1', 'value2'...) with your specific accepted values.
  • Dataform Behavior:

    • If a custom assertion fails, Dataform will signal an error in your Dataform job execution, and you'll likely receive notifications based on your Dataform project's setup.
  • Dataform Core vs SQLX: The SQLX method is generally preferred for defining Dataform configurations. JavaScript method is more useful if you need dynamic logic or interaction with external data sources.

View solution in original post

4 REPLIES 4

Dataform does not have a directly equivalent "accepted_values" test as found in DBT. However, you can achieve a similar behavior and create custom assertions using these approaches:

1. Using rowConditions in table configurations:

If the table or view you want to check is defined within your Dataform code (dataform.json or .sqlx files), you can use rowConditions assertions. This allows you to write SQL expressions that should evaluate to true for all rows.

 
config {
   type: "table",
   assertions: {
     rowConditions: [
       "`<your_column>` IN ('value1', 'value2', 'value3')" // Replace with actual column name and accepted values
     ]
   }
}

SELECT 
    ... -- your table creation logic
  • If any row in the table has a value for <your_column> that is not in the specified list, the assertion will fail.

2. Creating Manual Assertions:

For more flexibility or to test tables/views not directly defined in Dataform, you can create manual assertions.

a. Using Dataform SQLX files:

 
-- File: custom_assertions.sqlx

config {
   type: "assertion",  
   assertionName: "check_column_accepted_values"  
} 

SELECT * 
FROM `<your_dataset>.<your_table>`
WHERE `<your_column>` NOT IN ('value1', 'value2', 'value3'); 
  • This assertion query is designed to return rows that violate the condition (values not in the accepted list). If the query returns anyrows, the assertion fails.

b. Using JavaScript (Dataform Core):

JavaScript
// In your JavaScript file
assert("check_column_accepted_values")
  .query(`
    SELECT * 
    FROM \`<span class="math-inline">\{dataset\}\.</span>{table}\`
    WHERE \`${column}\` NOT IN ('value1', 'value2', 'value3')
  `);

Important Notes:

  • Replace placeholders:

    • <your_dataset>, <your_table>, and <your_column> need to be replaced with the actual names.
    • Update the list ('value1', 'value2'...) with your specific accepted values.
  • Dataform Behavior:

    • If a custom assertion fails, Dataform will signal an error in your Dataform job execution, and you'll likely receive notifications based on your Dataform project's setup.
  • Dataform Core vs SQLX: The SQLX method is generally preferred for defining Dataform configurations. JavaScript method is more useful if you need dynamic logic or interaction with external data sources.

Thanks @ms4446, it worked.


@ms4446 wrote:

assertionName: "check_column_accepted_values"


 

What is the significance of "assertionName:"? I can't find anything about it the docs.

Edit: I don't believe it is a valid property.

DataEngineer_0-1705952571063.png

 

Hi @DataEngineer ,

Sorry for the confusion. You are correct that assertionName is not a standard property in Dataform assertion configurations.  

To Clarify In Dataform, particularly within SQLX files, assertions are configured using the following elements:

  • Config Block:

    The config { ... } block is essential for defining the configuration of an assertion. This is where you specify the type of SQLX file and other optional properties.

  • Type:

    It's mandatory to specify type: "assertion" in the config block. This clearly indicates that the SQLX file is intended to define an assertion.

  • Database, Schema, Table (Optional):

    These properties are optional and can be used to specify the database, schema, and table where the assertion is relevant. However, their necessity depends on the structure of your Dataform project and the specific requirements of your assertion.

  • SQL Query:

    The heart of an assertion is the SQL query. This query should be crafted to return rows only when the assertion's condition is violated, i.e., when the data fails to meet the specified criteria.

Example of a Dataform assertion in a .sqlx file:

 
config { 
  type: "assertion" 
  // Optional: database = "your_database" 
  // Optional: schema = "your_schema" 
} 

SELECT * 
FROM `<your_dataset>.<your_table>` 
WHERE `<your_column>` NOT IN ('value1', 'value2', 'value3');