How can I share common code, such as functions and variables, across different Dataform repositories? As my team prefers not to use public npm packages, I'm aware that creating a private npm package and installing it (https://cloud.google.com/dataform/docs/private-packages) is one option. Are there any alternative approaches to achieve this?
Yes, there are alternative approaches to sharing common code in Google Cloud Dataform without using public or private npm packages. Here are two common alternatives:
Git submodules allow embedding a Git repository as a subdirectory within another repository, facilitating the integration of shared code into multiple Dataform projects.
Steps:
git submodule add <repository-url> <path/to/submodule>
).Advantages:
Disadvantages:
Dataform's support for importing JavaScript modules directly into SQL files allows for the integration of common functions and variables into SQL workflows.
Steps:
import
statement.Advantages:
Disadvantages:
The choice between Git submodules and JavaScript modules in Google Cloud Dataform hinges on various factors, including team preferences, project structure, and existing workflows. It's crucial to weigh the pros and cons of each method in the context of specific requirements and team capabilities. Effective code sharing is key to maintaining a consistent, reusable, and maintainable codebase across Dataform projects. Careful evaluation of each method's nuances is essential to select the one that best suits your team's needs.
Hi @ms4446 ,
I'm having this problem right now and I tried the Git submodules approach that you've recommended above. It's working in local machine via VS Code, but when I was trying to edit via Google Cloud DataForm Portal, the files inside the submodule are not visible. Am I missing some configuration here? Thanks in advance!
The issue you're encountering with Git submodules not being visible in the Google Cloud Dataform Portal is a known limitation. The portal does not currently support viewing or editing files from Git submodules directly within its interface. This is because the portal only displays files that are directly within the repository, and submodules are treated as separate repositories.
Here are a few workarounds you can consider:
Continue developing locally using VS Code, where the submodules are correctly loaded. Once you've made your changes, push them to your repository and deploy via the Google Cloud Dataform Portal. This keeps the submodule approach intact but limits portal-based edits.
Flatten the structure by copying the shared code directly into each repository. While this removes the benefits of shared version control, it allows you to edit the code directly within the portal.
Create a script that pulls the latest version of the shared code into your main repository as part of your CI/CD pipeline or as a pre-deployment step. This way, you still have the benefits of centralized code management without relying on submodules directly.
Periodically sync the submodule content to the main repository without using the submodule feature. This approach involves manually updating the shared code in each repository when changes are made.
Hello @ms4446 ,
Based on this alternative
JavaScript Modules: A Direct Import Approach with Dynamic SQL Generation
How to import the module to another dataform repository without creating package ?
@Yanias , Unfortunately, there's no direct way to import JavaScript Modules across separate Dataform repositories without using either packages or a shared storage mechanism. Here's the reasoning:
Dataform's Project-Based Structure: Each Dataform repository operates as a self-contained entity. JavaScript modules you create within one repository are isolated to that specific project.
Limited File Access: Dataform doesn't offer a built-in way to reference files located in a different repository directly.