Hello everyone,
I'm just learning Dataform, and I need to log in a table the tag used to excecute the dataform job (sqlx). As a sqlx job can get many tags assigned, Is there a function or method to get the tag that was used to execute the job? Thank you in advance for any help.
Best Regards,
Mauricio García
Solved! Go to Solution.
Unfortunately, Dataform doesn’t have a built-in function or method to directly retrieve the execution tags within a SQLX job, the only way to manage execution tags through environment variables or parameters which requires the caller of the job to pass the correct tags
While Dataform doesn't have a built-in function or method to directly retrieve the execution tags within a SQLX job, you can achieve this through a combination of approaches:
Before your main SQLX job runs, you can set up environment variables or parameters to capture the execution tags. Dataform allows you to pass these variables when executing a job, and you can access them in your SQLX scripts.
Example:
dataform run --env="EXECUTION_TAGS=tag1,tag2"
context.env
to access these tags and log them:config {
type: "operations"
}
-- SQL code to log execution tags
create table if not exists execution_log (
job_id string,
execution_tags string,
execution_time timestamp
);
insert into execution_log
select
current_job_id() as job_id,
'${context.env.EXECUTION_TAGS}' as execution_tags,
current_timestamp() as execution_time;
Post-Execution Logging
Post-execution logging directly within Dataform SQLX scripts isn’t feasible since SQLX scripts don’t have a built-in mechanism to handle actions after all jobs complete. Instead, ensure that logging happens during the execution of your main SQLX job as shown above.
Important Considerations:
Thank you ms4446! 🙂
This is a good approach, but the responsability of register the right tag that was used is passed to the caller of the job and I have to to do customizations to all the callers of the job in order to pass the Envairoment Variable.
If Dataform can't do this directly, maybe would be a good improvment for future versions. As a sqlx job can be called by different tags (differents flows), maybe we would like to have certain differences in behaviour of the job depending of the flow it is running for. Sorry for my english, I wish you understand the idea.
Best Regards!
Unfortunately, Dataform doesn’t have a built-in function or method to directly retrieve the execution tags within a SQLX job, the only way to manage execution tags through environment variables or parameters which requires the caller of the job to pass the correct tags
Hello, I would like to know if the function you mentioned "current_job_id()" is an actual function in dataform because i need to use the same id for the workflow execution in order to log every action and ive tried to use it but i got an error saying it isnt defined
thank you!
I apologize for the confusion. Dataform currently doesn't have a built-in function like current_job_id() to retrieve the job ID directly in SQLX scripts. To achieve the goal of logging every action with a consistent execution ID, alternative methods are required.
One solution is external job ID management. If you're using orchestration tools like Cloud Composer or Cloud Workflows to trigger Dataform jobs, you can pass a unique job ID as an environment variable or argument when the job starts. This ID can then be accessed within the SQLX script through the context.env object. For example, by passing EXECUTION_ID=your_unique_execution_id during the job execution, you can log this ID in your SQLX scripts to track actions within the workflow.
Another approach is to implement custom logging within Dataform using pre_operations and post_operations. Although Dataform does not support generating UUIDs natively in SQLX, you can still pass a unique identifier from an external orchestration layer. By utilizing post_operations, you can insert log entries into a designated logging table, capturing the execution ID and any relevant actions.
It is important to ensure that any method used for generating execution IDs produces unique values to avoid conflicts in the log. Additionally, while Dataform is primarily focused on data transformation tasks, for more complex logging or orchestration requirements, integrating it with other Google Cloud services, such as Composer, Workflows, or Cloud Logging, may be necessary to fully manage and monitor job executions.
Thank you for the fast reply!
Currently we are using a similar approach of what you mention:
i've thought of using an environmental variable as you said but since it must be sent from the orchestration tool i rather first try to generate it in DataForm
anyway, thank you very much once again for your reply it was very helpful to know that the approach we are building seems to be a valid one
edit: after i sent the reply i was thinking about generating an id with the compilation and start execution time but i have to check if the global start time is accessible from dataform while running
edit2: also a table in bigquery can be used to insert an id and then get it from actions but i think it will increase a lot the billing unnecessarily