I have encountered a strange issue with dataform, whereby a full refresh of the pipeline results in slightly different totals from the final output tables for the same date range depending on when you pull the data.
At present, i have a daily incremental execution that adds the new data up to 3 days previous. So always adding a roughly full day of data to the final tables.
Then on Sunday there is a full refresh of all tables in the pipeline to account for changes to a table that is joined to the main tables and provides some contextual dimensions.
What we have noticed is that the data stays the same within a week (mon - fri) when the incremental executions occur. But when we check data on Monday we get different totals from the previous week. This is for a data range that is now 2-3 weeks before the current date.
The source data should not be changing this far after its initial creation, so I'm struggling to see what the issue is.
Are there any known features / bugs in dataform / bigquery that mean you get a "sampled" dataset for any reason?