Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to Categorize BigQuery Jobs by Dataform Routine

Hi there,

I have a Jobs table (JOBS_BY_PROJECT from INFORMATION_SCHEMA) in my BigQuery. I want to differentiate and categorize these jobs based on the Dataform routine from which they were executed. I tried identifying it using the labels and job execution ID, but it doesn’t indicate the Dataform workflow.

GuilhermeKaneda_0-1730220756041.png

How could I retrieve the workflow or version of a Dataform job?

GuilhermeKaneda_1-1730220836332.png

Thanks,
Guilherme

0 1 201
1 REPLY 1

Hi @GuilhermeKaneda,

Welcome to Google Cloud Community!

You're right, the BigQuery INFORMATION_SCHEMA.JOBS_BY_PROJECT table doesn't directly expose the Dataform workflow or version name. You need to use Dataform's audit logs for that specific information. However, you can use the labels on the BigQuery jobs to connect them back to Dataform runs and actions:

  • In BigQuery: Filter INFORMATION_SCHEMA.JOBS_BY_PROJECT by the dataform_project_id label to isolate jobs created by your Dataform project.
  • Group and Analyze: Use the dataform_workflow_invocation_id and dataform_workflow_execution_action_id labels to group related BigQuery jobs. All jobs from the same Dataform run will share the same dataform_workflow_invocation_id.
  • Consult Dataform Logs/UI: Take the dataform_workflow_invocation_id and, optionally, the dataform_workflow_execution_action_id and look them up in your Dataform logs or UI. This is where you'll find the specific workflow name, version, and other relevant details like the run start time.

I hope the above information is helpful.