Hi there,
As analytics engineers, we want to be able to tag our dataform queries by the layer and domain so that we can see how much each layer and domain costs us (using the info schema).
We initially thought we could do this via the BigQuery labels config, however, these only populate at the table level, and are very unfriendly to be able to access from the Info Schema and then tag back up to individual queries.
Therefore, we are wanting to apply the query_label reference on each query, which quite simply comes through the labels field in the same info schema table that the query level bytes come through.
The only way we've found that we can do this is by applying a pre_operation, like so:
pre_operations {
SET @@query_label = "label:mart_core";
}
However, when we do this, the query compilation is affected and dataform is not able to tell us if the query will fail, and also cannot estimate the amount of bytes for the query.
So is there a better way of implementing query_labels to our models that will allow us to still see the query compilation errors and estimated bytes?
Thanks,
Benji
Solved! Go to Solution.
i think the easiest is to go to the job explorer and search for labels with one of the keys mentioned above :
i am also using this dataform core version
Can you expand on the phrase "the query compilation is affected and dataform is not able to tell us if the query will fail, and also cannot estimate the amount of bytes for the query." .... what does this look/feel like in terms of working with Dataform? What does it look like if we try and run the same query in the BigQuery Studio environment?
States: "WARNING: Could not compute bytes processed estimate for script."
Which is exactly the same message that happens if I copy it out into BigQuery console.
However, if I comment out the one line that is setting the query_label, then it does give the bytes estimation.
For query error checking, when I leave the SET line in (no comment), and then change one of the fields I'm selecting to have a typo in it, then it shows the exact same thing as my screenshot (does not error); instead of what it should show (when I comment out that line) of:
Hello,
i think dataform now has implemented job labels in the workflow executions, i checked in the job explorer and i saw that the jobs have these labels defined :
dataform_workflow_execution_action_id_schema:tag,
dataform_workflow_execution_id:id, dataform_repository_id:repo_id,
dataform_workflow_execution_action_id_database:database,
dataform_workflow_execution_action_id_name:current_transformation
but it will be cool to be able to define custom job labels easily to have a more granular insight on the cost, performance of the queries.
This sounds interesting, but yeh not very granular and potentially just similar to the detination table data that is already there.
But, sorry, where do you see this? I'm going into the "Job Explorer" section, clicking on one of the Job IDs to pull up the "Job Information" and can't see anything in labels, or anything referencing those fields..?
i think the easiest is to go to the job explorer and search for labels with one of the keys mentioned above :
i am also using this dataform core version
Ahhhh, it will be because of the dataform version. We have 2.0.1, and we definitely don't have any of those labels. We'll look to upgrade.
Still not convinced on if it's valuable, though. Seems to just give the same as what the destination_table gives. Would you be able to give an example or explanation of what the dataform_workflow_execution_id and dataform_repository_id values output?