Hi everyone
I am currently new at working with dataform in bigquery
I would like to know how I can make my sqlx files write to a dataset outside the project defined by default.
This is because in my company the resulting table MUST exist in a specific project and dataset outside of the ones I am using for the staging process
Project1.Dataset1(only staging files)
Project2.Dataset2 (only final tables)
To write SQLX files to a dataset outside the default project in Google Cloud Dataform:
1. Override Project and Dataset in SQLX Table Definition:
config { ... database: "Project2",
schema: "Dataset2" }
2. Consider Using the Open-Source Dataform CLI:
npm i -g @dataform/cli@^2.3.2
dataform init
Additional Considerations:
defaultLocation
in dataform.json
for consistency.
This part:
config { ... database: "Project2",
schema: "Dataset2" }
Is not working:
This is my dataform.json file:
--
Best regards
David Regalado
Web | Linkedin | Cloudskillsboost
There are a couple of things to address in your example to make your SQLX file correctly write to the intended project and dataset:
1. Typos and Syntax:
test_david_ILS_categories
) – ensure it's spelled correctly and used consistently throughout your files.config
block and be correctly indented.2. Project and Dataset Targeting:
dataform.json
looks correct, setting the default database to "mvp". Make sure your SQLX file's configuration aligns with this.Key Points:
Important Note on Data Loading:
Dataform's Focus: Dataform is primarily designed for transforming data that is already within BigQuery. It doesn't directly support commands like LOAD DATA OVERWRITE
.
Loading Tools: To get data from CSV files into BigQuery, use tools like:
bq
command-line tool (for automation and scripting)Example using bq
:
bq load --source_format=CSV --autodetect \
mvp.ILS.test_david_ILS_categories \
gs://data_model_generator/categories_2024_02*.csv