Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Schema Suffix in Dataform: Not found: Table

Hello,

Within my Dataform project, I've configured tables with incremental mode.

My objective is to use schema suffixes to isolate each development into distinct schemas (datasets). However, when I apply a schema suffix, I encounter an error with the incremental table, as there is no table named "schema_[schema_suffix].table_name."

This is expected as the dataset hasn't been generated.

My question is: Should I create the dataset manually every time?

Solved Solved
0 1 666
1 ACCEPTED SOLUTION

When using schema suffixes in Google Cloud Dataform to isolate development environments, you may encounter issues with missing datasets, especially when dealing with incremental tables. Dataform does not automatically create datasets with schema suffixes, so here are two recommended approaches:

1. Manual Dataset Creation:

  • Process: Before running Dataform commands, manually create the target dataset. This ensures datasets exist for all schema_suffix combinations.
  • Consideration: Simple, but time-consuming and error-prone,particularly in large projects or frequently changing environments.

2. Automated Dataset Creation through Javascript:

  • Process: Utilize JavaScript within Dataform's SQLX scripts to check and create the target dataset if it doesn't exist.
  • Example:
const isProduction = dataform.projectConfig.defaultSchema.endsWith('_prod');
let targetSchema = dataform.targetSchema;
if (isProduction) {
  targetSchema += "_prod";
}

// Replace 'datasetExists' and 'createDataset' with actual Dataform API methods
if (!dataform.backend.datasetExists(targetSchema)) {
  console.log(`Creating dataset: ${targetSchema}`);
  dataform.backend.createDataset(targetSchema);
}

Recommendations:

  • Stay Updated: Regularly consult Dataform documentation and community resources for updates related to dataset creation and schema suffixes.
  • Evaluate Your Needs: Choose the approach that best fits your project's size, complexity, and automation requirements.
  • Test and Validate: Thoroughly test and validate any scripts in a safe environment before production use.
  • Modular Approach: For larger projects, consider separating dataset management scripts from data transformation scripts for better maintainability.
  • Backup and Recovery: Always have backup and recovery plans in place when automating operations like dataset creation.

By choosing the appropriate approach, you can effectively manage development environments with schema suffixes and ensure smooth handling of incremental tables without missing datasets.

View solution in original post

1 REPLY 1

When using schema suffixes in Google Cloud Dataform to isolate development environments, you may encounter issues with missing datasets, especially when dealing with incremental tables. Dataform does not automatically create datasets with schema suffixes, so here are two recommended approaches:

1. Manual Dataset Creation:

  • Process: Before running Dataform commands, manually create the target dataset. This ensures datasets exist for all schema_suffix combinations.
  • Consideration: Simple, but time-consuming and error-prone,particularly in large projects or frequently changing environments.

2. Automated Dataset Creation through Javascript:

  • Process: Utilize JavaScript within Dataform's SQLX scripts to check and create the target dataset if it doesn't exist.
  • Example:
const isProduction = dataform.projectConfig.defaultSchema.endsWith('_prod');
let targetSchema = dataform.targetSchema;
if (isProduction) {
  targetSchema += "_prod";
}

// Replace 'datasetExists' and 'createDataset' with actual Dataform API methods
if (!dataform.backend.datasetExists(targetSchema)) {
  console.log(`Creating dataset: ${targetSchema}`);
  dataform.backend.createDataset(targetSchema);
}

Recommendations:

  • Stay Updated: Regularly consult Dataform documentation and community resources for updates related to dataset creation and schema suffixes.
  • Evaluate Your Needs: Choose the approach that best fits your project's size, complexity, and automation requirements.
  • Test and Validate: Thoroughly test and validate any scripts in a safe environment before production use.
  • Modular Approach: For larger projects, consider separating dataset management scripts from data transformation scripts for better maintainability.
  • Backup and Recovery: Always have backup and recovery plans in place when automating operations like dataset creation.

By choosing the appropriate approach, you can effectively manage development environments with schema suffixes and ensure smooth handling of incremental tables without missing datasets.