Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Use of when(incremental() in pure JS files.

Hello Community,

I am currently working on a model, PURE JS, that iterates through a configuration to publish incremental tables. My objective is to introduce a variable that captures partitions for efficient partition pruning but only when  incremental() . This means that during a full refresh, either nothing should happen or a different set of operations should be executed, but i need this in pure JS.

I have been searching for documentation on this topic, but unfortunately, I haven't found relevant information on dataform.co or in the new Google documentation.

example of code:

    if (kind === 'operacional') {
        publish(silver_name, {
                type: 'incremental',
                schema: schema,
                tags: tags,
                bigquery: {
                    "updatePartitionFilter": `${silverData.partition_by} in UNNEST(partition_pruning)`,
                    partitionBy: silverData['partition_expression'],
                    clusterBy: silverData['clustering']
                },
            })
            .preOps(ctx => {
                const tableIntermediate = `${ctx.ref(ref_name)}`;
                const tableSilver = `${ctx.self()}`;            
                return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData);
            })
            .query(ctx => {
                const tableSilver = `${ctx.ref('int_' + tableName + '_lastTrue')}`;
                return `SELECT *
                        FROM ${tableSilver} as s
                        WHERE s.${tableData.TRANSACTION_COL} IN ('${upsertOrders.join("', '")}')
                        `
            }).uniqueKey(pk_columns)
        .assertions(assertions)

Any insights, resources, or examples related to this would be highly appreciated. If anyone has encountered a similar scenario or knows how to approach this problem effectively, your advice would be invaluable.

Thank you in advance for your help.

Best regards,

Ricardo

 

Solved Solved
3 5 1,233
1 ACCEPTED SOLUTION

Hi @novas ,

Please Note: Dataform's documentation and tooling are constantly evolving, which sometimes leads to limited information on specific topics like pure JS models and their interaction with incremental tables. However, based on your requirements, here's a structured approach to achieve your objectives:

1. Variable for Partition Pruning:

  • Introduce a variable named partitionPruning outside the publish function. This variable will store the partitions relevant for efficient pruning.
  • Update partitionPruning within the preOps or query functions, based on the specific logic of your Dataform script and the structure of your data. It's essential to have a clear understanding of your data's partitioning scheme to implement this effectively.
  • Ensure that partitionPruning accurately reflects the partitions pertinent to the current execution context, whether it's an incremental update or a full refresh.

2. Differentiating Operations:

  • Utilize ctx.isIncremental() to determine if the current execution is an incremental update. Understanding the distinction between full refreshes and incremental updates is crucial in Dataform's operational model.
  • Implement distinct logic within if statements to cater to full refresh and incremental scenarios, allowing for tailored operations suited to each execution type.

3. Example Implementation:

  • Here's a JavaScript snippet to illustrate the concept:
     
    const partitionPruning = []; // Initialize the variable
    
    if (kind === 'operacional') {
      publish(silver_name, { ... })
        .preOps(ctx => {
          if (ctx.isIncremental()) {
            // Logic for collecting partitions during incremental updates
            partitionPruning.push(...);
          }
    
          // Incorporate partitionPruning in pre-ops logic
          return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData, partitionPruning);
        })
        ...;
    }
    
    This code demonstrates initializing partitionPruning and using it conditionally in the preOps function, depending on whether the run is incremental. The generatePreOpsMergeQuery function should be designed to integrate partition pruning into your SQL query generation.

4. Important Notes:

  • Regularly check the latest Dataform documentation for updates in APIs and best practices. Staying updated can significantly impact the performance and efficiency of your scripts.
  • Thoroughly test your script in both full refresh and incremental scenarios to ensure expected behavior under various conditions.

5. Resources:

  • Refer to Dataform's documentation and participate in the community forum for further guidance and support. Engaging with the community can provide practical solutions, networking, and learning opportunities.

Understanding your specific data structure and the functionalities of Dataform is crucial for successful implementation. The guidelines provided here need to be adapted to your particular context. Experimenting and iterating on your scripts is often key in mastering Dataform's capabilities, ensuring that your script accurately reflects the nuances of your data and the objectives of your project.

View solution in original post

5 REPLIES 5

Hi @novas ,

Please Note: Dataform's documentation and tooling are constantly evolving, which sometimes leads to limited information on specific topics like pure JS models and their interaction with incremental tables. However, based on your requirements, here's a structured approach to achieve your objectives:

1. Variable for Partition Pruning:

  • Introduce a variable named partitionPruning outside the publish function. This variable will store the partitions relevant for efficient pruning.
  • Update partitionPruning within the preOps or query functions, based on the specific logic of your Dataform script and the structure of your data. It's essential to have a clear understanding of your data's partitioning scheme to implement this effectively.
  • Ensure that partitionPruning accurately reflects the partitions pertinent to the current execution context, whether it's an incremental update or a full refresh.

2. Differentiating Operations:

  • Utilize ctx.isIncremental() to determine if the current execution is an incremental update. Understanding the distinction between full refreshes and incremental updates is crucial in Dataform's operational model.
  • Implement distinct logic within if statements to cater to full refresh and incremental scenarios, allowing for tailored operations suited to each execution type.

3. Example Implementation:

  • Here's a JavaScript snippet to illustrate the concept:
     
    const partitionPruning = []; // Initialize the variable
    
    if (kind === 'operacional') {
      publish(silver_name, { ... })
        .preOps(ctx => {
          if (ctx.isIncremental()) {
            // Logic for collecting partitions during incremental updates
            partitionPruning.push(...);
          }
    
          // Incorporate partitionPruning in pre-ops logic
          return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData, partitionPruning);
        })
        ...;
    }
    
    This code demonstrates initializing partitionPruning and using it conditionally in the preOps function, depending on whether the run is incremental. The generatePreOpsMergeQuery function should be designed to integrate partition pruning into your SQL query generation.

4. Important Notes:

  • Regularly check the latest Dataform documentation for updates in APIs and best practices. Staying updated can significantly impact the performance and efficiency of your scripts.
  • Thoroughly test your script in both full refresh and incremental scenarios to ensure expected behavior under various conditions.

5. Resources:

  • Refer to Dataform's documentation and participate in the community forum for further guidance and support. Engaging with the community can provide practical solutions, networking, and learning opportunities.

Understanding your specific data structure and the functionalities of Dataform is crucial for successful implementation. The guidelines provided here need to be adapted to your particular context. Experimenting and iterating on your scripts is often key in mastering Dataform's capabilities, ensuring that your script accurately reflects the nuances of your data and the objectives of your project.

Thank you @ms4446  😊 for your fast and detailed response regarding the use of Dataform, particularly in the context of JavaScript models and incremental tables. Your step-by-step guide, from partition pruning to differentiating operations, is immensely helpful. The specific JavaScript example you provided clarifies how to implement these concepts effectively. I appreciate the emphasis on staying updated with Dataform's evolving documentation and the reminder about the importance of testing in both full refresh and incremental scenarios.


thank u again 😊

Hello @ms4446 ,

First and foremost, I would like to extend my heartfelt gratitude for your prompt response yesterday; it was incredibly helpful. I have a few clarifications and questions to further understand some concepts.

Regarding DataForm, at least in the version I am currently using, it appears that ctx.isIncremental() does not exist as a function, but rather as a property ctx.isIncremental. This leads me to a point of confusion, this is because my version? if I "update" the version this could change?

novas_2-1700044617535.png

Another example of recent changes I notice are with the declare function:

before it was fine:

 

 

declare({
             source_project,
            dataset_source,
           file_name
        });

 

But now it most be like:

 

        declare({
            database: source_project,
            schema: dataset_source,
            name: file_name
        });

 

There is a complete change log or any place where i could check the API changes by version?

Additionally, your suggestion to store partition pruning list in a JavaScript variable is looks like a great idea. However, it seems that this approach is only viable if the pruning follows a specific logic, rather than being data-driven. This brings me to an important question: is there a way to store in a JavaScript variable the data obtained from a query in DataForm? This functionality would be extremely beneficial, especially in scenarios where the list exceeds the storage capabilities of a SQL (BigQuery) variable or any other data-driven logic.

Another area where I seek guidance is regarding the possibility of viewing custom loggers in my code. Specifically, I am curious about something akin to a "print" function for debugging purposes or any official supported framework for testing JS logic.

Once again, thank you for your invaluable assistance and insights.

Best regards,

Ricardo

Hi @novas ,

Your questions about Dataform are very insightful. So keep them coming 😊 Here are some clarifications and answers to your questions::

  1. Regarding ctx.isIncremental Property:

    • In your version of Dataform, ctx.isIncremental is a property, not a function. This means you should use it as ctx.isIncremental (without parentheses). The behavior and implementation of this property can vary between different versions of Dataform, so it's crucial to refer to the specific documentation for the version you are using for accurate usage.
  2. Regarding Changes to the declare Function:

    • The evolution in the declare function's syntax to a more structured format is indicative of updates in Dataform's API. It's essential to adapt your scripts to these changes to ensure compatibility and to take advantage of any new features introduced in the latest versions.
  3. Regarding a Change Log for Dataform APIs:

    • Dataform may not provide a comprehensive change log for all API changes. However, you can stay informed about the latest updates and modifications by regularly checking Dataform's release notes and announcements. These resources are valuable for keeping your knowledge and scripts up to date.
  4. Regarding Storing Partition Pruning List in a JavaScript Variable:

    • Using a JavaScript variable for storing a static or pattern-based partition pruning list is feasible. However, for dynamic, data-driven partition pruning, this approach might not be practical within Dataform, as the platform's scripts are primarily designed for generating SQL code rather than executing queries or storing query results.
  5. Regarding Storing Data Obtained from a Query in a JavaScript Variable:

    • Directly executing a query and storing its results in a JavaScript variable within Dataform is generally not possible. If you need to use query results within your Dataform scripts, you may have to run the query externally (e.g., in a separate script or application) and then input the results into your Dataform script.
  6. Regarding Viewing Custom Loggers in Dataform Code:

    • Dataform scripts do not typically support JavaScript execution features like logging to the console, as they are designed to generate SQL code. The custom logger example you provided would be applicable in a standard JavaScript environment but is likely not suitable within Dataform scripts. For debugging, you might need to rely on external methods, such as testing the generated SQL outside of Dataform.

 

@ms4446 

Thank you again for all the information is incredibly helpful.

Best,