Hi everyone,
I'm new to Dataform technology and I need your support to understand if I'm doing something wrong. I've read various topics and documentation, and I've come to the conclusion that for the work I need to do, I should create my SQLX file of type "operations". The purpose of my Dataform is to create table inserts by reading from views and then execute them via Composer with a parameter. To achieve this, I need to create a releaseId and a workflow in Dataform that includes the SQLX files to be executed, which will then be passed to Composer.
Here is my SQLX configuration:
definitions/first_view.sqlx config { type: "operations", description: "Create view to other table.", hasOutput: true } CREATE OR REPLACE VIEW `prova-001-prj.L0_VIEW.First_View` AS SELECT * FROM `prova-001-prj.L0_TABLE.First_Table_OLD` UNION ALL SELECT * FROM `prova-001-prj.L0_TABLE.First_Table`
When I click "run" on the SQLX file, everything works fine and the view is created. However, when I try to "start execution", it generates additional code that I don't need:
BEGIN CREATE SCHEMA IF NOT EXISTS prova-001-prj.L0_VIEW OPTIONS(location="europe-west8"); EXCEPTION WHEN ERROR THEN IF NOT CONTAINS_SUBSTR(@@error.message, "already exists: dataset") AND NOT CONTAINS_SUBSTR(@@error.message, "too many dataset metadata update operations") AND NOT CONTAINS_SUBSTR(@@error.message, "User does not have bigquery.datasets.create permission") THEN RAISE USING MESSAGE = @@error.message; END IF; END; BEGIN CREATE OR REPLACE VIEW `prova-001-prj.L0_VIEW.First_View` .........
I thought adding hasOutput: true would solve the issue, but it didn't. Could you please help me understand why this is happening and how I can prevent the generation of unnecessary code during execution?
Thank you!
Solved! Go to Solution.
Hi @CILONO,
Welcome to Google Cloud Community!
Dataform's CREATE SCHEMA IF NOT EXISTS block ensures that the dataset (schema in BigQuery terms) exists before creating views or tables. Even when working with a view and specifying hasOutput: true, Dataform still performs this dataset existence check as standard behavior.
To address this, you can use Dataform's database configuration. This approach defines the project for Dataform to use, bypassing the automatic creation of the dataset. This also centralizes project information, improving maintainability.
To implement this, you can modify the database property in your dataform.json file, located at the root of your Dataform repository. By specifying the defaultDatabase (your Google Cloud project ID), you inform Dataform that the dataset already exists, or that you will ensure its creation through other means. This causes Dataform to skip the CREATE SCHEMA IF NOT EXISTS check.
Furthermore, compilation overrides offer a more flexible and environment-aware way to manage Dataform's behavior and prevent the unwanted CREATE SCHEMA IF NOT EXISTS statement, particularly when deploying through Composer across different environments. Compilation overrides provide enhanced control and maintainability compared to manual post-compilation script modifications.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi @CILONO,
Welcome to Google Cloud Community!
Dataform's CREATE SCHEMA IF NOT EXISTS block ensures that the dataset (schema in BigQuery terms) exists before creating views or tables. Even when working with a view and specifying hasOutput: true, Dataform still performs this dataset existence check as standard behavior.
To address this, you can use Dataform's database configuration. This approach defines the project for Dataform to use, bypassing the automatic creation of the dataset. This also centralizes project information, improving maintainability.
To implement this, you can modify the database property in your dataform.json file, located at the root of your Dataform repository. By specifying the defaultDatabase (your Google Cloud project ID), you inform Dataform that the dataset already exists, or that you will ensure its creation through other means. This causes Dataform to skip the CREATE SCHEMA IF NOT EXISTS check.
Furthermore, compilation overrides offer a more flexible and environment-aware way to manage Dataform's behavior and prevent the unwanted CREATE SCHEMA IF NOT EXISTS statement, particularly when deploying through Composer across different environments. Compilation overrides provide enhanced control and maintainability compared to manual post-compilation script modifications.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.