Hi all,
I have the configuration below in my dataform.json file
@ms4446 Please help and thanks for always!
Solved! Go to Solution.
Handling datasets stored in multiple locations in GCP within Dataform can be challenging due to the current limitations of the platform. However, here are some potential workarounds and strategies you can consider:
Use the defaultLocation
property: Set the defaultLocation
property in the dataform.json
file to the location where the majority of your datasets reside. This sets the default location for all datasets in the project. However, this approach can be limiting if you need to work with datasets in multiple locations.
Separate Projects for Each Location: Consider creating separate Dataform projects for each GCP location. This way, you can set the defaultLocation
specific to each project, ensuring that each project is tailored to a specific location.
Data Transfer: If feasible, consider transferring datasets from one location to another within GCP, so they are all in the same location. This can be done using tools like gsutil
or BigQuery's transfer service.
Hi @francisatoyebi ,
In Dataform, you can specify the location for a specific SQLX file using the location property in the config block.
For example, if you want a table to be created in the europe-west2 location, you can use the following configuration:
config {
type: "table",
location: "europe-west2"
}
SELECT * FROM ...
This configuration tells Dataform to create or reference the table in the europe-west2 location, overriding the defaultLocation set in the dataform.json file.
Use this approach for each .sqlx file that references a table in a non-default location.
At which version is it possible to specify `location` in a config block per `.sqlx` file, as you suggested?
I have tried that before and when I did, it throws an error that says location is not a parameter.
I am currently thinking this feature is not yet live in dataform.
I apologize for the oversight. Apparently the location
parameter is not recognized by Dataform and the ability to specify location for individual .sqlx
files is not yet supported.
The only way to currently work with datasets in non-default locations is to set the defaultLocation
property in the dataform.json
file to the location of the dataset you are working with. For example, if you have a dataset named my_dataset
in the europe-west2
location, you would set the defaultLocation
property as follows:
{
"defaultLocation": "europe-west2"
}
This would cause all datasets referenced in Dataform to be created or referenced in the europe-west2
location.
So how do I handle this if I want to work with datasets stored in multiple locations in GCP? Cos I noticed that dataform only runs for datasets that are saved in the same location as the default location.
Handling datasets stored in multiple locations in GCP within Dataform can be challenging due to the current limitations of the platform. However, here are some potential workarounds and strategies you can consider:
Use the defaultLocation
property: Set the defaultLocation
property in the dataform.json
file to the location where the majority of your datasets reside. This sets the default location for all datasets in the project. However, this approach can be limiting if you need to work with datasets in multiple locations.
Separate Projects for Each Location: Consider creating separate Dataform projects for each GCP location. This way, you can set the defaultLocation
specific to each project, ensuring that each project is tailored to a specific location.
Data Transfer: If feasible, consider transferring datasets from one location to another within GCP, so they are all in the same location. This can be done using tools like gsutil
or BigQuery's transfer service.
Thanks mate!!!