Hi there,
We are currently evaluating how to migrate our analytics setup from Scheduled Queries to a Dataform repository. At the moment, we have an SQL workflow that runs as a scheduled query to produce daily snapshots, which are then stored in a historical snapshots table containing data for a specified date range. Additionally, we have a separate query to handle backfilling.
Is it possible to implement a similar design in Dataform? Specifically, how would we handle backfilling the historical snapshot table? I came across the example in the documentation for backfilling an incremental table with JavaScript, but the example provided only covers a simple SQL statement rather than executing an entire SQL workflow over a given time frame.
Would you be able to provide guidance on the best approach for this in Dataform?
Thank you for your help!
In your original SQL workflow, what was your query for handling backfills?
Hi @mrni11,
Welcome to Google Cloud Community!
Dataform can support this setup by using SQL for transformations and JavaScript for managing the backfilling process. The core idea is to create dynamic SQL queries for each date in the backfill range and use Dataform’s operations to execute them sequentially. You can define SQL workflows in SQLX files, with tables, views, or incremental tables executing SQL queries. JavaScript in operations allows for handling more complex tasks like scheduling and backfilling.
You can implement your design in Dataform as follows:
Backfilling in Dataform is typically managed by following these steps:
Additionally, I came across a thread where someone asked about backfilling tables via Dataform, and it was answered by one of our expert staff. Take a look at it, as I believe it will be very helpful for your migration process.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi,
Thank you for your reply, it was helpful. So far, I understand the process, I also came across the thread you linked to.
But I can't find a solution for the main question, is it possible to trigger a whole SQL workflow from the JavaScript backfill script? That is, in the example script, while looping through the dates, it is only possible to execute SQL operations defined in that very script.
which would not help with the outlined design.
Thank you!