I have this query in Dataform which I am trying to optimise. I cannot find a way to do it. Any help will be appreciated. Scanning records with timestamp is not helping unfortunately
Any help will be gratefully appreciated.
config {
type: "table",
description: "Information from PHY diagnostics",
assertions: {
uniqueKey: ["eqp","force_vector","scalar_date","update_status","due_date","visual_character"],
nonNull: [
"session_id",
"eqp",
"visual_character",
"system"
],
}
}
WITH PHY_diagnostics_table AS (
SELECT
PHY.now_time AS visual_character,
'N/A' AS retailer_name,
PHY.session_id,
'N/A' AS ufn,
'N/A' AS uln,
PHY.eqp AS EQP,
'N/A' AS rc,
'N/A' AS md,
'N/A' AS di,
'N/A' AS h,
'PHY_DIAGNOSTICS' AS System,
V.scalar_date,
V.Due_date,
V.force_vector,
V.force_region,
V.update_status
FROM ${ref("PHY_DIAGNOSTICS_INTERACTIONS")} PHY
JOIN ${ref('result_card')} V ON PHY.eqp = V.EQP
AND DATE(PHY.now_time) >= DATE_ADD( V.scalar_date , INTERVAL -7 DAY )
AND (DATE(PHY.now_time) < V.Due_date OR Due_date IS NULL)
)
SELECT DISTINCT * FROM PHY_diagnostics_table
Optimization Strategies for Your Dataform Query
Partitioning:
PHY_DIAGNOSTICS_INTERACTIONS
and result_card
) by a relevant date column (e.g., now_time
or scalar_date
). This can significantly reduce the amount of data scanned during JOIN operations.Filtering and Pruning:
WHERE
clause to limit the data processed. For example:FROM ${ref("PHY_DIAGNOSTICS_INTERACTIONS")} PHY
WHERE DATE(PHY.now_time) >= DATE_ADD(V.scalar_date, INTERVAL -7 DAY)
AND (DATE(PHY.now_time) < V.Due_date OR V.Due_date IS NULL)
Denormalization:
PHY_DIAGNOSTICS_INTERACTIONS
and result_card
, consider creating a denormalized table that combines data from both.Indexing:
WHERE
clauses (e.g., EQP
, now_time
,scalar_date
).Materialized Views:
PHY_diagnostics_table
),consider using a materialized view. This pre-computes and stores the results, improving query performance.Profiling and Experimentation:
Additional Considerations:
Please Note:
These strategies serve as general guidance. The best approach depends on your specific dataset, table structures, and the features of your data warehouse. It is crucial to test any optimization changes in a non-production environment before deploying them.