Hello Community,
I am currently working on a model, PURE JS, that iterates through a configuration to publish incremental tables. My objective is to introduce a variable that captures partitions for efficient partition pruning but only when incremental() . This means that during a full refresh, either nothing should happen or a different set of operations should be executed, but i need this in pure JS.
I have been searching for documentation on this topic, but unfortunately, I haven't found relevant information on dataform.co or in the new Google documentation.
example of code:
if (kind === 'operacional') {
publish(silver_name, {
type: 'incremental',
schema: schema,
tags: tags,
bigquery: {
"updatePartitionFilter": `${silverData.partition_by} in UNNEST(partition_pruning)`,
partitionBy: silverData['partition_expression'],
clusterBy: silverData['clustering']
},
})
.preOps(ctx => {
const tableIntermediate = `${ctx.ref(ref_name)}`;
const tableSilver = `${ctx.self()}`;
return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData);
})
.query(ctx => {
const tableSilver = `${ctx.ref('int_' + tableName + '_lastTrue')}`;
return `SELECT *
FROM ${tableSilver} as s
WHERE s.${tableData.TRANSACTION_COL} IN ('${upsertOrders.join("', '")}')
`
}).uniqueKey(pk_columns)
.assertions(assertions)
Any insights, resources, or examples related to this would be highly appreciated. If anyone has encountered a similar scenario or knows how to approach this problem effectively, your advice would be invaluable.
Thank you in advance for your help.
Best regards,
Ricardo
Solved! Go to Solution.
Hi @novas ,
Please Note: Dataform's documentation and tooling are constantly evolving, which sometimes leads to limited information on specific topics like pure JS models and their interaction with incremental tables. However, based on your requirements, here's a structured approach to achieve your objectives:
1. Variable for Partition Pruning:
partitionPruning
outside the publish
function. This variable will store the partitions relevant for efficient pruning.partitionPruning
within the preOps
or query
functions, based on the specific logic of your Dataform script and the structure of your data. It's essential to have a clear understanding of your data's partitioning scheme to implement this effectively.partitionPruning
accurately reflects the partitions pertinent to the current execution context, whether it's an incremental update or a full refresh.2. Differentiating Operations:
ctx.isIncremental()
to determine if the current execution is an incremental update. Understanding the distinction between full refreshes and incremental updates is crucial in Dataform's operational model.if
statements to cater to full refresh and incremental scenarios, allowing for tailored operations suited to each execution type.3. Example Implementation:
const partitionPruning = []; // Initialize the variable
if (kind === 'operacional') {
publish(silver_name, { ... })
.preOps(ctx => {
if (ctx.isIncremental()) {
// Logic for collecting partitions during incremental updates
partitionPruning.push(...);
}
// Incorporate partitionPruning in pre-ops logic
return generatePreOpsMergeQuery(tableIntermediate, tableSilver, tableData, partitionPruning);
})
...;
}
partitionPruning
and using it conditionally in the preOps
function, depending on whether the run is incremental. The generatePreOpsMergeQuery
function should be designed to integrate partition pruning into your SQL query generation.4. Important Notes:
5. Resources:
Understanding your specific data structure and the functionalities of Dataform is crucial for successful implementation. The guidelines provided here need to be adapted to your particular context. Experimenting and iterating on your scripts is often key in mastering Dataform's capabilities, ensuring that your script accurately reflects the nuances of your data and the objectives of your project.