BigQuery returns different results for exactly same query when using LIMIT

On runniing the exactly same query i get different results. 

Details:

Total results are 64155

since i have a limit retrieval limit  of 50000 max rows, I am trying to retrieve them in two steps. On fetching first batch of 50k rows i get different result on every execution

IF i don't use Limit Results are Ok. 

Findings: 

 

Cache is turned off

Query 1: 

 

select count(*), sum(fact__values_sum) from (
SELECT
      `dim_product`.PH_1 `dim_product__ph_1`, `dim_product`.PH_3 `dim_product__ph_3`, `dim_geography`.H1_8 `dim_geography__h1_8`, `dim_measure`.label `dim_measure__label`, `dim_time_period`.label `dim_time_period__label`, `dim_geography`.H1_5 `dim_geography__h1_5`, `dim_geography`.H1_6 `dim_geography__h1_6`, DATETIME_TRUNC(DATETIME(`fact`.`reported_date`, 'UTC'), MONTH) `fact__reporteddate_month`, sum(`fact`.values) `fact__values_sum`, sum(`fact`.valuesly) `fact__values_ly_sum`, sum(`fact`.valuespp) `fact__values_p_p_sum`
    FROM
      plexus-336107.plexusdata.fact AS `fact`
LEFT JOIN plexus-336107.plexusdata.dim_product AS `dim_product` ON `fact`.product_id = `dim_product`.id
LEFT JOIN plexus-336107.plexusdata.dim_geography AS `dim_geography` ON `fact`.geography_id = `dim_geography`.id
LEFT JOIN plexus-336107.plexusdata.dim_measure AS `dim_measure` ON `fact`.measure_id = `dim_measure`.id
LEFT JOIN plexus-336107.plexusdata.dim_time_period AS `dim_time_period` ON `fact`.time_period_id = `dim_time_period`.id  WHERE (`fact`.`reported_date` >= TIMESTAMP('2023-01-01T00:00:00.000Z') AND `fact`.`reported_date` <= TIMESTAMP('2023-01-01T23:59:59.999Z')) AND (`fact`.data_type_id = CAST('100' AS FLOAT64)) GROUP BY 1, 2, 3, 4, 5, 6, 7, 8 ORDER BY 8 ASC LIMIT 50000  ) where dim_product__ph_1
="Gucci" and dim_time_period__label='MTD' and dim_measure__label='Quantity Sold'

 

 

 


Query 1 Results:

 

results--> 2283  ,  24085.0
job_id -->bquxjob_476e8876_18ea056aa87

results--> 2263  , 23977.0
job_id--> bquxjob_78a92fd0_18ea0570760

 

Query 2:

select count(*), sum(fact__values_sum) from (

 SELECT
      `dim_product`.PH_1 `dim_product__ph_1`, `dim_product`.PH_3 `dim_product__ph_3`, `dim_geography`.H1_8 `dim_geography__h1_8`, `dim_measure`.label `dim_measure__label`, `dim_time_period`.label `dim_time_period__label`, `dim_geography`.H1_5 `dim_geography__h1_5`, `dim_geography`.H1_6 `dim_geography__h1_6`, DATETIME_TRUNC(DATETIME(`fact`.`reported_date`, 'UTC'), MONTH) `fact__reporteddate_month`, sum(`fact`.values) `fact__values_sum`, sum(`fact`.valuesly) `fact__values_ly_sum`, sum(`fact`.valuespp) `fact__values_p_p_sum`
    FROM
      plexus-336107.plexusdata.fact AS `fact`
LEFT JOIN plexus-336107.plexusdata.dim_product AS `dim_product` ON `fact`.product_id = `dim_product`.id
LEFT JOIN plexus-336107.plexusdata.dim_geography AS `dim_geography` ON `fact`.geography_id = `dim_geography`.id
LEFT JOIN plexus-336107.plexusdata.dim_measure AS `dim_measure` ON `fact`.measure_id = `dim_measure`.id
LEFT JOIN plexus-336107.plexusdata.dim_time_period AS `dim_time_period` ON `fact`.time_period_id = `dim_time_period`.id  WHERE `fact`.`reported_date` ="2023-01-01" AND (`fact`.data_type_id = CAST('100' AS FLOAT64)) GROUP BY 1, 2, 3, 4, 5, 6, 7, 8 ORDER BY 8 ASC LIMIT 50000  ) where dim_product__ph_1
="Gucci" and dim_time_period__label='MTD' and dim_measure__label='Quantity Sold'

Query 2 Results :

 

Job_id--> bquxjob_3e391a4c_18ea0511613
result--> 2297  25048.0
Job_id--> bquxjob_58310e22_18ea051eb25
result--> 2268  25329.0
Job_id--> bquxjob_3c8eb262_18ea052ba77
result--> 2269  24752.0

 

 

 

 

given  below are relevant schemas

Hassan12323_0-1712093757649.png

 

Hassan12323_1-1712093757927.png

 

Hassan12323_2-1712093757907.png

 

Hassan12323_3-1712093757925.png

 

Hassan12323_4-1712093757689.png

 

Any help is appreciated.
Thanks In Advance

0 1 80
1 REPLY 1

Are any of the tables changing between queries?  Maybe save the results of the subqueries between each of your job runs and see how they change?  Try simplify the query one part at a time (remove columns and joins) and see how minimal you can get that still results in non-deterministic results from the queries.