Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform - differing values after full refresh

Hi,

My issue is that the weekly rebuild of the dataform pipeline results in changes to overall total volumes from historical data. 

Every Sunday there is a complete rebuild of the pipeline to integrate any new business dimensions that have been added that week. So the underlying metrics should be unchanged from say December, because the source data for that hasn't changed.

What I see is a slight variation in volumes for certain groupings, after each rebuild. So the total volumes may differ by 2-3% week to week when looking at the same date range.

I am wondering if there is an reason for this in BigQuery - is there any sampling applied which means not all rows are always included on a large dataset.

 

Thanks

 

0 1 171
1 REPLY 1