EDIT: I've figured out how I could do this using operations within Dataform. Article here.
Hi everyone,
I have a data asset that needs to be computed every day (hencewhy I'm using dataform - it's a great tool), but I've come across a limitation. The computation requires pivoting data. The dataset looks like this:
Institution | Year | Result |
Glasgow | 2022 | 5 |
Glasgow | 2023 | 1 |
Aberdeen | 2023 | 6 |
Aberdeen | 2022 | 6 |
I need to pivot it such that it looks like this
Institution | 2022 Result | 2023 Result |
Glasgow | 5 | 1 |
Aberdeen | 6 | 6 |
It's easy enough to do in the BigQuery editor using dynamic pivot statements as this blog outlines. Super useful.
The problem is that I can't run EXECUTE IMMEDIATE FORMAT() statements in Dataform. I tried to dynamically reference my tables wrapped within a pivot() function I wrote:
let pivot = (
schemaRef,
tableName,
aggregatedColumn,
pivotColumn
) => {
let refString = `
ref({
schema: ${schemaRef},
name: "${tableName}"
})`
return `EXECUTE IMMEDIATE FORMAT("""
SELECT * FROM ${refString}
PIVOT (SUM(${aggregatedColumn}) as ${aggregatedColumn} FOR ${pivotColumn} IN %s);
""", (SELECT CONCAT("(", STRING_AGG(DISTINCT CONCAT("'", ${pivotColumn}, "'"), ','), ")") FROM ${refString}))
`
};
module.exports = {
pivot
};
And with the SQLX looking like this:
config {
type: "table",
description: "Each record represents a response within typeform.",
schema: dataform.projectConfig.vars.typeform_intermediate_schema,
tags: ["typeform_mart"]
}
${
pivot.pivot(schemaRef = schema,
tableName = "response_counts",
aggregatedColumn = "response_count",
pivotColumn = "form_year")
}
But the issue is that the ref() string can only be used within an SQLX context. Big issue, because I need this to run after the tables prior "response_counts" have materialised as a part of the data pipeline.
Any help would be appreciated!