Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform Upstream table dependencies

Hello

I am working on a project where a tables `table1`, `table2` is created in BigQuery via terraform. 

In dataform, I am building a view based on the table and I want to use this build dependency check in .sqlx file to ensure view is not run before the table is deployed in BigQuery via Terraform. How do I achieve this?

See below query and error.

below sample code of my myview.sqlx file

 

config { 
type: "view",
schema: constants.dataset_name,
description: "Description of what this model represents"
}

select count(distinct name)
from ${constants.dataset_name}.${ref("table1")} ha
inner join ${constants.dataset_name}.${ref("table2")} sa on ha.account_hkey = sa.account_hkey

Below is the error I am getting

Error: Missing dependency detected: Action my_project_name.my_dataset.myview" depends on "{"name":"table1","includeDependentAssertions":false}" which does not exist

 

Solved Solved
0 3 1,191
1 ACCEPTED SOLUTION

Hi @bihagkashikar,

You are correct that multiple data sources can be declared within a single JavaScript file. However, querying information_schema to declare sources is not currently supported. Here are some potential approaches you might consider, including the one you've mentioned:

For more detailed guidance, you can refer to best practices for managing data sources.

Alternatively, if needed, you might consider submitting a feature request for enhanced functionality. I can't give a timeline for when this feature will be available, but I suggest checking the issue tracker and release notes for the latest updates.

I hope the above information is helpful.

View solution in original post

3 REPLIES 3

Hi @bihagkashikar,

It looks like you're trying to create a view in Dataform based on tables in BigQuery that were deployed using Terraform. You want the view to execute after the BigQuery tables are deployed.

The "Missing dependency detected" error means that Dataform can't locate the tables in BigQuery needed to build your view in myview.sqlx. 

In the ref function, you specify the name of the table or data source you want to reference, usually corresponding to the filename(SQLX) where that table or data source is defined. 

To resolve this, declare the BigQuery tables as data sources in Dataform and set them as dependencies for your myview.sqlx file. This ensures that Dataform recognizes these tables as inputs for your view. These dependency declarations form a tree that determines the order in which Dataform runs your SQL workflow objects.

Sample:

table1.sqlx

 

config {
 type: "table",
 database: "insert your project_id",
 schema: "insert your existing dataset name",
 name: "insert your existing table_name1",
}

 

table2.sqlx

 

config {
 type: "table",
 database: "insert your project_id",
 schema: "insert your existing dataset name",
 name: "insert your existing table_name2",
}

 

myview.sqlx

 

config { type: "view",
schema: "insert your existing dataset name",
dependencies:["table1","table2"]} //declared data sources

SELECT * FROM ${ref("table1")} //Your SQL statement

 

Note: Be sure that the spellings are correct, no extra whitespaces and confirm you are referencing the correct table.

I hope the above information is helpful.

Thank you @caryna  for the detailed explanation. Yes I did read this in the documentation a bit more closely and now I better understand your answer above

Now my next thought is any way I could declare multiple tables in one declaration file and use this as dependencies and on this I noticed this

https://cloud.google.com/dataform/docs/reference/sample-scripts#declaring_multiple_sources_within_on...

Question being is there any way to better above approach by declaring the sources by querying information_schema, this way we don't have declare the source in the List(strings) like in above case.

 

 

Hi @bihagkashikar,

You are correct that multiple data sources can be declared within a single JavaScript file. However, querying information_schema to declare sources is not currently supported. Here are some potential approaches you might consider, including the one you've mentioned:

For more detailed guidance, you can refer to best practices for managing data sources.

Alternatively, if needed, you might consider submitting a feature request for enhanced functionality. I can't give a timeline for when this feature will be available, but I suggest checking the issue tracker and release notes for the latest updates.

I hope the above information is helpful.