Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Use of when(incremental() in pure JS files.

Hello Community,

I am currently working on a model, PURE JS, that iterates through a configuration to publish incremental tables. My objective is to introduce a variable that captures partitions for efficient partition pruning but only when  incremental() . This means that during a full refresh, either nothing should happen or a different set of operations should be executed, but i need this in pure JS.

I have been searching for documentation on this topic, but unfortunately, I haven't found relevant information on dataform.co or in the new Google documentation.

example of code:

    if (kind === 'operacional') {
        publish(silver_name, {
                type: 'incremental',
                schema: schema,
                tags: tags,
                bigquery: {
                    "updatePartitionFilter": `${silverData.partition_by} in UNNEST(partition_pruning)`,
                    partitionBy: silverData['partition_expression'],
                    clusterBy: silverData['clustering']
                },
            })
            .preOps(ctx => {
                const tableIntermediate = `${ctx.ref(ref_name)}`;
                const tableSilver = `${ctx.self()}`;            
                return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData);
            })
            .query(ctx => {
                const tableSilver = `${ctx.ref('int_' + tableName + '_lastTrue')}`;
                return `SELECT *
                        FROM ${tableSilver} as s
                        WHERE s.${tableData.TRANSACTION_COL} IN ('${upsertOrders.join("', '")}')
                        `
            }).uniqueKey(pk_columns)
        .assertions(assertions)

Any insights, resources, or examples related to this would be highly appreciated. If anyone has encountered a similar scenario or knows how to approach this problem effectively, your advice would be invaluable.

Thank you in advance for your help.

Best regards,

Ricardo

 

Solved Solved
3 5 1,247
1 ACCEPTED SOLUTION

Hi @novas ,

Please Note: Dataform's documentation and tooling are constantly evolving, which sometimes leads to limited information on specific topics like pure JS models and their interaction with incremental tables. However, based on your requirements, here's a structured approach to achieve your objectives:

1. Variable for Partition Pruning:

  • Introduce a variable named partitionPruning outside the publish function. This variable will store the partitions relevant for efficient pruning.
  • Update partitionPruning within the preOps or query functions, based on the specific logic of your Dataform script and the structure of your data. It's essential to have a clear understanding of your data's partitioning scheme to implement this effectively.
  • Ensure that partitionPruning accurately reflects the partitions pertinent to the current execution context, whether it's an incremental update or a full refresh.

2. Differentiating Operations:

  • Utilize ctx.isIncremental() to determine if the current execution is an incremental update. Understanding the distinction between full refreshes and incremental updates is crucial in Dataform's operational model.
  • Implement distinct logic within if statements to cater to full refresh and incremental scenarios, allowing for tailored operations suited to each execution type.

3. Example Implementation:

  • Here's a JavaScript snippet to illustrate the concept:
     
    const partitionPruning = []; // Initialize the variable
    
    if (kind === 'operacional') {
      publish(silver_name, { ... })
        .preOps(ctx => {
          if (ctx.isIncremental()) {
            // Logic for collecting partitions during incremental updates
            partitionPruning.push(...);
          }
    
          // Incorporate partitionPruning in pre-ops logic
          return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData, partitionPruning);
        })
        ...;
    }
    
    This code demonstrates initializing partitionPruning and using it conditionally in the preOps function, depending on whether the run is incremental. The generatePreOpsMergeQuery function should be designed to integrate partition pruning into your SQL query generation.

4. Important Notes:

  • Regularly check the latest Dataform documentation for updates in APIs and best practices. Staying updated can significantly impact the performance and efficiency of your scripts.
  • Thoroughly test your script in both full refresh and incremental scenarios to ensure expected behavior under various conditions.

5. Resources:

  • Refer to Dataform's documentation and participate in the community forum for further guidance and support. Engaging with the community can provide practical solutions, networking, and learning opportunities.

Understanding your specific data structure and the functionalities of Dataform is crucial for successful implementation. The guidelines provided here need to be adapted to your particular context. Experimenting and iterating on your scripts is often key in mastering Dataform's capabilities, ensuring that your script accurately reflects the nuances of your data and the objectives of your project.

View solution in original post

5 REPLIES 5