Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

MIgrating from Dataform Web to BQ Dataform

I am migrating from Dataform web to BQ Dataform on GCP. I have created the Dataform repository, connected to my gitlab repository, created a development workspace and all are good. I have also changed what is necessary in package.json and dataform.json. I have yet to create release configurations and workflow configurations. I have a few basic questions on the next steps.

  • I have schedules running on the Dataform web, which runs everyday. If I were to create release and workflow configurations, when do I run off the scheduling on Dataform web. I do not want to run both of them at the same time.
  • If someone has already done the migration, how long does it take to migrate all the schedules to release and workflow configurations and get it running. I have to make sure all the configurations runs successfully BQ Dataform before I turn off the one on Dataform web.

Appreciate any advice on this migration.

0 5 363
5 REPLIES 5

You should turn off the scheduling on Dataform Web only after successfully migrating and verifying all schedules in BQ Dataform. It's crucial to ensure that the new setup in BQ Dataform is functioning as expected to avoid any disruptions in your data pipeline.

Estimated Migration Time

The migration time is subjective and depends on various factors including the complexity and quantity of your schedules and data. Allocate ample time for the migration, including additional time for testing, verification, and addressing any unforeseen issues.

Migration Advice

  1. Preparation:
    • Begin by enumerating all schedules and configurations for a systematic migration and to ensure no schedules are overlooked.
  2. Configuration and Testing:
    • Recreate and test each schedule in BQ Dataform, ensuring its functionality matches the original schedule in Dataform Web.
  3. Verification:
    • Confirm the successful operation of all migrated schedules in BQ Dataform before discontinuing the use of Dataform Web.

Additional Tips

  • Backup:
    • Prioritize creating a comprehensive backup of your Dataform Web project to safeguard against potential migration issues.
  • Utilize Migration Tools:
    • Explore available migration tools to streamline the process, especially if dealing with a large number of schedules.
  • Error Management:
    • Establish a robust error-handling mechanism for addressing migration errors, including a strategy for manually restarting failed schedules.
  • Post-Migration Assessment:
    • After migration, conduct a thorough assessment to ensure all schedules are operational and monitor them continuously for optimal performance.

Conclusion

Your migration endeavor is a significant task, and meticulous planning and execution are paramount. Your outlined approach is solid, and incorporating these additional suggestions can further enhance the migration process. Wishing you a smooth and successful migration!

Thank you rot eh tips and advice. Will follow them when migrating mine.

I have started the migration by connecting to the gitlab repository. Created a token for the access to gitlab in GCP Dataform. However, since then, I'm getting error on my Dataform Web

Error: 2 UNKNOWN: fatal: unable to access 'https://dataform:[token]@gitlab.com/xxxxx/xxxx/': The requested URL returned error: 403

I have not changed anything on the Dataform Web. It's related to the gitlab token, and I did not remember changing anything on Dataform Web or deleting any token by accident.  Now my schedules still run on Dataform Web, but any new code or changes will not be compiled by Dataform Web.

 

The error message you are seeing indicates that Dataform Web is unable to access your GitLab repository using the provided token. This could be due to several reasons:

  1. The token might be incorrect or has expired.
  2. The token does not possess the required permissions to access your repository.
  3. If your repository is private, the token might not be linked to a GitLab user with access rights.
  4. A network issue could be preventing Dataform Web from connecting to GitLab.

Troubleshooting Steps:

  • Token Verification: Ensure the token is valid and hasn't expired by logging into GitLab and navigating to "Personal Access Tokens" under your profile. Here, you can view all tokens and their expiration dates.

  • Token Permissions: Verify that the token has the necessary permissions. This can be checked by editing the token's permissions in GitLab.

  • Private Repository Access: If your repository is private, ensure the token is linked to a GitLab user with appropriate access rights.

  • Restart Dataform Web: Sometimes, simply restarting the application can resolve connectivity issues.

  • Contact Support: If the issue persists, consider reaching out to Dataform's support team for further assistance.

Additional Tips:

  • New Token: Consider generating a new token and integrating it with Dataform Web.

  • Clone Test: Try cloning your repository using the token. Successful cloning indicates the token's validity and permissions.

  • Network Test: Attempt accessing GitLab from an alternate network to rule out any network-related issues.

Impact on Schedules:

While your schedules will continue to run on Dataform Web, any new code changes or modifications to existing schedules won't be possible until the connectivity issue is resolved.

Thanks again for the quick response. I did check and try the first three troubleshooting steps, though I'm no longer getting compilation error, I do still receive the same error message. I have replaced the token and with these permission

api, read_api, read_user, create_runner, read_repository, write_repository, read_registry, write_registry

I am the owner of the repository, so my personal token should not be an issue. On restarting Dataform Web, how do you even do that, if it is running on the cloud? Is there a way to restart it. My apology for not being aware of such function.