Dataform Location Error

Hi all,

I have the configuration below in my dataform.json file

{
  "defaultSchema": "datalake_analytics",
  "assertionSchema": "dataform_assertions",
  "warehouse": "bigquery",
  "defaultDatabase": "cfc-2017-site-login",
  "defaultLocation": "EU"
}
 
the "defaultLocation" is set to EU because most of our data is stored in the EU but now, I need to work on a table that is stored in "europe-west2" and this causes an error this error "Not found: Dataset domain:table_name was not found in location EU at [4:19]."
How can I specify a location for specific sqlx files to be run to avoid this error.
 
Thanks guys!

@ms4446 Please help and thanks for always!

Solved Solved
0 6 1,442
1 ACCEPTED SOLUTION

Handling datasets stored in multiple locations in GCP within Dataform can be challenging due to the current limitations of the platform. However, here are some potential workarounds and strategies you can consider:

  1. Use the defaultLocation property: Set the defaultLocation property in the dataform.json file to the location where the majority of your datasets reside. This sets the default location for all datasets in the project. However, this approach can be limiting if you need to work with datasets in multiple locations.

  2. Separate Projects for Each Location: Consider creating separate Dataform projects for each GCP location. This way, you can set the defaultLocation specific to each project, ensuring that each project is tailored to a specific location.

  3. Data Transfer: If feasible, consider transferring datasets from one location to another within GCP, so they are all in the same location. This can be done using tools like gsutil or BigQuery's transfer service.

 

View solution in original post

6 REPLIES 6

Hi @francisatoyebi ,

 

In Dataform, you can specify the location for a specific SQLX file using the location property in the config block.

For example, if you want a table to be created in the europe-west2 location, you can use the following configuration:

 

config {
  type: "table",
  location: "europe-west2"
}

SELECT * FROM ...

This configuration tells Dataform to create or reference the table in the europe-west2 location, overriding the defaultLocation set in the dataform.json file.

Use this approach for each .sqlx file that references a table in a non-default location.

I have tried that before and when I did, it throws an error that says location is not a parameter.

I am currently thinking this feature is not yet live in dataform.

I apologize for the oversight. Apparently the location parameter is not recognized by Dataform and the ability to specify location for individual .sqlx files is not yet supported.

The only way to currently work with datasets in non-default locations is to set the defaultLocation property in the dataform.json file to the location of the dataset you are working with. For example, if you have a dataset named my_dataset in the europe-west2 location, you would set the defaultLocation property as follows:

 

{
  "defaultLocation": "europe-west2"
}

This would cause all datasets referenced in Dataform to be created or referenced in the europe-west2 location.

So how do I handle this if I want to work with datasets stored in multiple locations in GCP? Cos I noticed that dataform only runs for datasets that are saved in the same location as the default location.

Handling datasets stored in multiple locations in GCP within Dataform can be challenging due to the current limitations of the platform. However, here are some potential workarounds and strategies you can consider:

  1. Use the defaultLocation property: Set the defaultLocation property in the dataform.json file to the location where the majority of your datasets reside. This sets the default location for all datasets in the project. However, this approach can be limiting if you need to work with datasets in multiple locations.

  2. Separate Projects for Each Location: Consider creating separate Dataform projects for each GCP location. This way, you can set the defaultLocation specific to each project, ensuring that each project is tailored to a specific location.

  3. Data Transfer: If feasible, consider transferring datasets from one location to another within GCP, so they are all in the same location. This can be done using tools like gsutil or BigQuery's transfer service.

 

Thanks mate!!!