Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to get execution tag in Dataform

Hello everyone,

I'm just learning Dataform, and I need to log in a table the tag used to excecute the dataform job (sqlx).  As a sqlx job can get many tags assigned, Is there a function or method to get the tag that was used to execute the job?  Thank you in advance for any help.

Best Regards,

Mauricio García

Solved Solved
2 6 1,295
1 ACCEPTED SOLUTION

Unfortunately, Dataform doesn’t have a built-in function or method to directly retrieve the execution tags within a SQLX job, the only way to manage execution tags through environment variables or parameters which requires the caller of the job to pass the correct tags

View solution in original post

6 REPLIES 6

While Dataform doesn't have a built-in function or method to directly retrieve the execution tags within a SQLX job, you can achieve this through a combination of approaches:

Before your main SQLX job runs, you can set up environment variables or parameters to capture the execution tags. Dataform allows you to pass these variables when executing a job, and you can access them in your SQLX scripts.

Example:

  • Pass Tags as Environment Variables: When running the Dataform job, pass the execution tags as environment variables. For example:
dataform run --env="EXECUTION_TAGS=tag1,tag2"
  • Access and Log Tags in SQLX: In your SQLX script, use context.env to access these tags and log them:
 
config {
  type: "operations"
}

-- SQL code to log execution tags
create table if not exists execution_log (
  job_id string,
  execution_tags string,
  execution_time timestamp
);

insert into execution_log
select
  current_job_id() as job_id,
  '${context.env.EXECUTION_TAGS}' as execution_tags,
  current_timestamp() as execution_time;

Post-Execution Logging

Post-execution logging directly within Dataform SQLX scripts isn’t feasible since SQLX scripts don’t have a built-in mechanism to handle actions after all jobs complete. Instead, ensure that logging happens during the execution of your main SQLX job as shown above.

Important Considerations:

  • Authentication: Ensure that your Dataform job has the necessary permissions to read and write to the logging table.
  • Error Handling: Implement error handling where necessary, especially when accessing environment variables.
  • Timing: The above approach captures the tags during the execution of your main SQLX job, which is usually sufficient for logging purposes.

Thank you ms4446! 🙂

This is a good approach, but the responsability of register the right tag that was used is passed to the caller of the job and I have to to do customizations to all the callers of the job in order to pass the Envairoment Variable.

If Dataform can't do this directly, maybe would be a good improvment for future versions.  As a sqlx job can be called by different tags (differents flows), maybe we would like to have certain differences in behaviour of the job depending of the flow it is running for.  Sorry for my english, I wish you understand the idea.  

Best Regards!

Unfortunately, Dataform doesn’t have a built-in function or method to directly retrieve the execution tags within a SQLX job, the only way to manage execution tags through environment variables or parameters which requires the caller of the job to pass the correct tags

Hello, I would like to know if the function you mentioned "current_job_id()" is an actual function in dataform because i need to use the same id for the workflow execution in order to log every action and ive tried to use it but i got an error saying it isnt defined

thank you!

I apologize for the confusion. Dataform currently doesn't have a built-in function like current_job_id() to retrieve the job ID directly in SQLX scripts. To achieve the goal of logging every action with a consistent execution ID, alternative methods are required.

One solution is external job ID management. If you're using orchestration tools like Cloud Composer or Cloud Workflows to trigger Dataform jobs, you can pass a unique job ID as an environment variable or argument when the job starts. This ID can then be accessed within the SQLX script through the context.env object. For example, by passing EXECUTION_ID=your_unique_execution_id during the job execution, you can log this ID in your SQLX scripts to track actions within the workflow.

Another approach is to implement custom logging within Dataform using pre_operations and post_operations. Although Dataform does not support generating UUIDs natively in SQLX, you can still pass a unique identifier from an external orchestration layer. By utilizing post_operations, you can insert log entries into a designated logging table, capturing the execution ID and any relevant actions.

It is important to ensure that any method used for generating execution IDs produces unique values to avoid conflicts in the log. Additionally, while Dataform is primarily focused on data transformation tasks, for more complex logging or orchestration requirements, integrating it with other Google Cloud services, such as Composer, Workflows, or Cloud Logging, may be necessary to fully manage and monitor job executions.

Thank you for the fast reply!

Currently we are using a similar approach of what you mention:

  1. an initial action previous to the ETL that generates a row per action in the log table with an initial state (here i tried to generate an unique id for the entire run using javascript but both approaches that i've tried failed: first one is to assign and export the execution id but since it runs at compilation, every run with the same compilation have the same execution id [one workaround would be to compile before running but i don't know if thats efficient], and the other approach is to run a function in the first step that set the variable but it remains null so doesn't work either)
  2. in every action on pre operations set the state of that action to started
  3. then in post operations set the state of that action to finished

i've thought of using an environmental variable as you said but since it must be sent from the orchestration tool i rather first try to generate it in DataForm

anyway, thank you very much once again for your reply it was very helpful to know that the approach we are building seems to be a valid one

edit: after i sent the reply i was thinking about generating an id with the compilation and start execution time but i have to check if the global start time is accessible from dataform while running

edit2: also a table in bigquery can be used to insert an id and then get it from actions but i think it will increase a lot the billing unnecessarily