Hello Community,
I am currently working on a model, PURE JS, that iterates through a configuration to publish incremental tables. My objective is to introduce a variable that captures partitions for efficient partition pruning but only when incremental() . This means that during a full refresh, either nothing should happen or a different set of operations should be executed, but i need this in pure JS.
I have been searching for documentation on this topic, but unfortunately, I haven't found relevant information on dataform.co or in the new Google documentation.
example of code:
if (kind === 'operacional') {
publish(silver_name, {
type: 'incremental',
schema: schema,
tags: tags,
bigquery: {
"updatePartitionFilter": `${silverData.partition_by} in UNNEST(partition_pruning)`,
partitionBy: silverData['partition_expression'],
clusterBy: silverData['clustering']
},
})
.preOps(ctx => {
const tableIntermediate = `${ctx.ref(ref_name)}`;
const tableSilver = `${ctx.self()}`;
return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData);
})
.query(ctx => {
const tableSilver = `${ctx.ref('int_' + tableName + '_lastTrue')}`;
return `SELECT *
FROM ${tableSilver} as s
WHERE s.${tableData.TRANSACTION_COL} IN ('${upsertOrders.join("', '")}')
`
}).uniqueKey(pk_columns)
.assertions(assertions)
Any insights, resources, or examples related to this would be highly appreciated. If anyone has encountered a similar scenario or knows how to approach this problem effectively, your advice would be invaluable.
Thank you in advance for your help.
Best regards,
Ricardo
Solved! Go to Solution.
Hi @novas ,
Please Note: Dataform's documentation and tooling are constantly evolving, which sometimes leads to limited information on specific topics like pure JS models and their interaction with incremental tables. However, based on your requirements, here's a structured approach to achieve your objectives:
1. Variable for Partition Pruning:
partitionPruning
outside the publish
function. This variable will store the partitions relevant for efficient pruning.partitionPruning
within the preOps
or query
functions, based on the specific logic of your Dataform script and the structure of your data. It's essential to have a clear understanding of your data's partitioning scheme to implement this effectively.partitionPruning
accurately reflects the partitions pertinent to the current execution context, whether it's an incremental update or a full refresh.2. Differentiating Operations:
ctx.isIncremental()
to determine if the current execution is an incremental update. Understanding the distinction between full refreshes and incremental updates is crucial in Dataform's operational model.if
statements to cater to full refresh and incremental scenarios, allowing for tailored operations suited to each execution type.3. Example Implementation:
const partitionPruning = []; // Initialize the variable
if (kind === 'operacional') {
publish(silver_name, { ... })
.preOps(ctx => {
if (ctx.isIncremental()) {
// Logic for collecting partitions during incremental updates
partitionPruning.push(...);
}
// Incorporate partitionPruning in pre-ops logic
return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData, partitionPruning);
})
...;
}
partitionPruning
and using it conditionally in the preOps
function, depending on whether the run is incremental. The generatePreOpsMergeQuery
function should be designed to integrate partition pruning into your SQL query generation.4. Important Notes:
5. Resources:
Understanding your specific data structure and the functionalities of Dataform is crucial for successful implementation. The guidelines provided here need to be adapted to your particular context. Experimenting and iterating on your scripts is often key in mastering Dataform's capabilities, ensuring that your script accurately reflects the nuances of your data and the objectives of your project.
Hi @novas ,
Please Note: Dataform's documentation and tooling are constantly evolving, which sometimes leads to limited information on specific topics like pure JS models and their interaction with incremental tables. However, based on your requirements, here's a structured approach to achieve your objectives:
1. Variable for Partition Pruning:
partitionPruning
outside the publish
function. This variable will store the partitions relevant for efficient pruning.partitionPruning
within the preOps
or query
functions, based on the specific logic of your Dataform script and the structure of your data. It's essential to have a clear understanding of your data's partitioning scheme to implement this effectively.partitionPruning
accurately reflects the partitions pertinent to the current execution context, whether it's an incremental update or a full refresh.2. Differentiating Operations:
ctx.isIncremental()
to determine if the current execution is an incremental update. Understanding the distinction between full refreshes and incremental updates is crucial in Dataform's operational model.if
statements to cater to full refresh and incremental scenarios, allowing for tailored operations suited to each execution type.3. Example Implementation:
const partitionPruning = []; // Initialize the variable
if (kind === 'operacional') {
publish(silver_name, { ... })
.preOps(ctx => {
if (ctx.isIncremental()) {
// Logic for collecting partitions during incremental updates
partitionPruning.push(...);
}
// Incorporate partitionPruning in pre-ops logic
return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData, partitionPruning);
})
...;
}
partitionPruning
and using it conditionally in the preOps
function, depending on whether the run is incremental. The generatePreOpsMergeQuery
function should be designed to integrate partition pruning into your SQL query generation.4. Important Notes:
5. Resources:
Understanding your specific data structure and the functionalities of Dataform is crucial for successful implementation. The guidelines provided here need to be adapted to your particular context. Experimenting and iterating on your scripts is often key in mastering Dataform's capabilities, ensuring that your script accurately reflects the nuances of your data and the objectives of your project.
Thank you @ms4446 😊 for your fast and detailed response regarding the use of Dataform, particularly in the context of JavaScript models and incremental tables. Your step-by-step guide, from partition pruning to differentiating operations, is immensely helpful. The specific JavaScript example you provided clarifies how to implement these concepts effectively. I appreciate the emphasis on staying updated with Dataform's evolving documentation and the reminder about the importance of testing in both full refresh and incremental scenarios.
thank u again 😊
Hello @ms4446 ,
First and foremost, I would like to extend my heartfelt gratitude for your prompt response yesterday; it was incredibly helpful. I have a few clarifications and questions to further understand some concepts.
Regarding DataForm, at least in the version I am currently using, it appears that ctx.isIncremental() does not exist as a function, but rather as a property ctx.isIncremental. This leads me to a point of confusion, this is because my version? if I "update" the version this could change?
Another example of recent changes I notice are with the declare function:
before it was fine:
declare({
source_project,
dataset_source,
file_name
});
But now it most be like:
declare({
database: source_project,
schema: dataset_source,
name: file_name
});
There is a complete change log or any place where i could check the API changes by version?
Additionally, your suggestion to store partition pruning list in a JavaScript variable is looks like a great idea. However, it seems that this approach is only viable if the pruning follows a specific logic, rather than being data-driven. This brings me to an important question: is there a way to store in a JavaScript variable the data obtained from a query in DataForm? This functionality would be extremely beneficial, especially in scenarios where the list exceeds the storage capabilities of a SQL (BigQuery) variable or any other data-driven logic.
Another area where I seek guidance is regarding the possibility of viewing custom loggers in my code. Specifically, I am curious about something akin to a "print" function for debugging purposes or any official supported framework for testing JS logic.
Once again, thank you for your invaluable assistance and insights.
Best regards,
Ricardo
Hi @novas ,
Your questions about Dataform are very insightful. So keep them coming 😊 Here are some clarifications and answers to your questions::
Regarding ctx.isIncremental Property:
ctx.isIncremental
is a property, not a function. This means you should use it as ctx.isIncremental
(without parentheses). The behavior and implementation of this property can vary between different versions of Dataform, so it's crucial to refer to the specific documentation for the version you are using for accurate usage.Regarding Changes to the declare Function:
declare
function's syntax to a more structured format is indicative of updates in Dataform's API. It's essential to adapt your scripts to these changes to ensure compatibility and to take advantage of any new features introduced in the latest versions.Regarding a Change Log for Dataform APIs:
Regarding Storing Partition Pruning List in a JavaScript Variable:
Regarding Storing Data Obtained from a Query in a JavaScript Variable:
Regarding Viewing Custom Loggers in Dataform Code: