Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform CI CD

Hello Community,

I have the following question:
In a GCP Dataform workspace lets say you have an .sqlx "operations" file

config {
    type: "operations",
    tags: ["someTag"]
}
 
You have the following:
 
SELECT a,b,c d from table
 
 
This sql code is incorrect, we got the following syntax error in GCP Dataform workspace
 
Syntax error: Expected end of input but got identifier "d"
 
 
In GitHub action I need to create the CI/CD data pipeline that is triggered when the code
is pushed to the remote branch.
 
Both github actions: dataform compile and dataform run --dry-run do not detect the syntax
errors. Please see the following issues:
Is there any task on GitHub actions that does detect the syntax errors?
I am not referring here about dataform run which it does, but it also executes all the
workflows which is something I do not want.

Thank you.
 
Solved Solved
1 4 3,817
2 ACCEPTED SOLUTIONS

The built-in dataform compile and dataform run --dry-run commands in Dataform are excellent for validating your project's overall structure, but they don't inherently catch SQL syntax errors. To proactively find these errors, consider these strategies:

1. External Linting / SQL Validation

  • Integrate a dedicated SQL linter into your CI/CD pipeline. These tools are specifically designed to parse SQL code and report syntax errors.
  • Choose a linter that supports the BigQuery SQL dialect to ensure compatibility with Dataform.

GitHub Actions Integration:

 
- name: SQL Linting
  uses: actions/setup-node@v2
  with:
    node-version: '14'  # Use a supported Node version
  run: | 
    npm install -g sqlfluff 
    sqlfluff lint --dialect bigquery my_dataform_project/definitions/ 

2. Pre-Commit Hooks

  • Prevent the introduction of syntax errors by using a pre-commit hook framework like 'pre-commit'.
  • Configure the hook to execute your SQL linter on modified .sqlx files before a developer can commit their changes.

3. Custom Validation Script

  • Develop a script that uses the BigQuery API to execute dry-runs of your SQL queries.
  • This offers accurate validation, but it does require some development effort.

Important Considerations:

  • Query Extraction: Carefully parse and extract SQL from .sqlx files.
  • API Quotas: Manage rate limits for BigQuery API calls.
  • GitHub Actions Integration: Clearly output errors in your CI/CD logs.

View solution in original post

While SQLFluff excels as a SQL linter, its compatibility with Dataform's unique syntax—such as config blocks, JavaScript (js) sections, and templating—presents certain challenges. Below, we explore these challenges and propose potential workarounds and strategies for developers.

Limitations of Custom Rules in SQLFluff

  • Custom rules in SQLFluff are primarily designed for enforcing SQL-specific styles. Extending the tool to fully understand Dataform's syntax would require more significant modifications.

Partial Linting: A Limited Solution

  • Disabling specific rules can provide temporary relief but may not fully address SQLFluff's limitations in parsing Dataform's extended syntax.

Templating: A Cautious Approach

  • Pre-processing or replacing template variables with placeholders can make the code easier to lint, but proceed carefully to avoid accidentally introducing new syntax errors.

Key Considerations

  • Current Tool Limitations: Be aware of the present limitations in linting Dataform-specific syntax within SQLFluff.
  • Community Collaboration: Engage with the SQLFluff community; your feedback could influence future updates and features.
  • Manual Review Importance: Thorough code reviews and adherence to Dataform best practices remain crucial in the absence of comprehensive, automated linting.

Additional Strategies for Effective Linting

  • Focused SQL Linting: Isolate pure SQL segments within Dataform files and lint them with SQLFluff.
  • IDE Support: Investigate IDE plugins or extensions that might offer better support for linting SQL mixed with templated code.
  • Comprehensive Testing: Implement robust unit, integration, and end-to-end tests for your data pipelines to ensure both syntactic correctness and functional reliability.

The integration between Dataform and SQLFluff may improve over time. Stay updated on the latest developments and best practices.

View solution in original post

4 REPLIES 4

The built-in dataform compile and dataform run --dry-run commands in Dataform are excellent for validating your project's overall structure, but they don't inherently catch SQL syntax errors. To proactively find these errors, consider these strategies:

1. External Linting / SQL Validation

  • Integrate a dedicated SQL linter into your CI/CD pipeline. These tools are specifically designed to parse SQL code and report syntax errors.
  • Choose a linter that supports the BigQuery SQL dialect to ensure compatibility with Dataform.

GitHub Actions Integration:

 
- name: SQL Linting
  uses: actions/setup-node@v2
  with:
    node-version: '14'  # Use a supported Node version
  run: | 
    npm install -g sqlfluff 
    sqlfluff lint --dialect bigquery my_dataform_project/definitions/ 

2. Pre-Commit Hooks

  • Prevent the introduction of syntax errors by using a pre-commit hook framework like 'pre-commit'.
  • Configure the hook to execute your SQL linter on modified .sqlx files before a developer can commit their changes.

3. Custom Validation Script

  • Develop a script that uses the BigQuery API to execute dry-runs of your SQL queries.
  • This offers accurate validation, but it does require some development effort.

Important Considerations:

  • Query Extraction: Carefully parse and extract SQL from .sqlx files.
  • API Quotas: Manage rate limits for BigQuery API calls.
  • GitHub Actions Integration: Clearly output errors in your CI/CD logs.

Hello again Community,

I have implemented the proposed solution. Please see the section of the .yml file that does the linting:

- name: Python Setup
uses: actions/setup-python@v2
with:
python-version: 3.8

- name: Install SQLFluff
run: pip install sqlfluff==0.3.6

- name: SQL Linting
run: sqlfluff lint ././definitions/intermediate/ecbd_bu_address_ml_t.sqlx --dialect bigquery

 However it seems that although I give as input parameter the "bigquery" dialect it is not enough.
Please see the errors:

Run sqlfluff lint ././definitions/intermediate/ecbd_bu_address_ml_t.sqlx --dialect bigquery
 
== [././definitions/intermediate/ecbd_bu_address_ml_t.sqlx] FAIL
 
L: 1 | P: 1 | PRS | Found unparsable section: 'config {\n type: "table",\n
 
| schema: "raw...'
 
L: 1 | P: 8 | LXR | Unable to lex characters: ''{\n type: '...'
 
L: 4 | P: 13 | LXR | Unable to lex characters: ''{\n part'...'
 
L: 7 | P: 3 | LXR | Unable to lex characters: ''},\n name:'...'
 
L: 11 | P: 15 | LXR | Unable to lex characters: ''{\n uniq'...'
 
L: 16 | P: 3 | LXR | Unable to lex characters: ''},\n tags:'...'
 
L: 18 | P: 1 | LXR | Unable to lex characters: ''}\n\njs {\n '...'
 
L: 20 | P: 4 | LXR | Unable to lex characters: ''{\n const '...'
 
L: 22 | P: 1 | LXR | Unable to lex characters: ''}\n\nSELECT\n'...'
 
L: 22 | P: 1 | PRS | Found unparsable section: '}\n\nSELECT\n CAST(buah_tk AS
 
| NUMERIC) AS ...'
 
L: 48 | P: 20 | LXR | Unable to lex characters: ''${ecbdInse'...'
 
L: 49 | P: 20 | LXR | Unable to lex characters: ''${ecbdInse'...'
 
L: 51 | P: 3 | LXR | Unable to lex characters: ''${ref("raw'...'
 

L: 51 | P: 42 | LXR | Unable to lex characters: ''}\n'...'

 

It looks like "sqlfluff" does not know anything about dataform.

These are the official dialects: https://docs.sqlfluff.com/en/stable/dialects.html
If you have any ideas, please share.

Thank you.

While SQLFluff excels as a SQL linter, its compatibility with Dataform's unique syntax—such as config blocks, JavaScript (js) sections, and templating—presents certain challenges. Below, we explore these challenges and propose potential workarounds and strategies for developers.

Limitations of Custom Rules in SQLFluff

  • Custom rules in SQLFluff are primarily designed for enforcing SQL-specific styles. Extending the tool to fully understand Dataform's syntax would require more significant modifications.

Partial Linting: A Limited Solution

  • Disabling specific rules can provide temporary relief but may not fully address SQLFluff's limitations in parsing Dataform's extended syntax.

Templating: A Cautious Approach

  • Pre-processing or replacing template variables with placeholders can make the code easier to lint, but proceed carefully to avoid accidentally introducing new syntax errors.

Key Considerations

  • Current Tool Limitations: Be aware of the present limitations in linting Dataform-specific syntax within SQLFluff.
  • Community Collaboration: Engage with the SQLFluff community; your feedback could influence future updates and features.
  • Manual Review Importance: Thorough code reviews and adherence to Dataform best practices remain crucial in the absence of comprehensive, automated linting.

Additional Strategies for Effective Linting

  • Focused SQL Linting: Isolate pure SQL segments within Dataform files and lint them with SQLFluff.
  • IDE Support: Investigate IDE plugins or extensions that might offer better support for linting SQL mixed with templated code.
  • Comprehensive Testing: Implement robust unit, integration, and end-to-end tests for your data pipelines to ensure both syntactic correctness and functional reliability.

The integration between Dataform and SQLFluff may improve over time. Stay updated on the latest developments and best practices.

Thank you for your answer.