Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

TABLESAMPLE is not allowed with array scans at [1:613]

We are trying TABLESAMPLE function with the query below.. When trying to execute the below query we see the error TABLESAMPLE is not allowed with array scans at [1:613].   

Do you have other approach to accomplish the same tasks as this query? or another way with TABLESAMPLE that avoids this error?

select
Min(als_0.light_id) as cvdraw_payloadmetal_id_min_value,
Max (als_0.light_id) as cvdraw_payloadmetal_id_max_value,
count (DISTINCT (als_0.light_id)) as cvdraw_payloadmetadatal_id_distinct_cnt,
STRING_AGG(DISTINCT(als_0.light_id),';' limit 10 ) AS cvdraw_payloadmetadatal_id_samp
from `project.dataset.table` sampled_data
left join unnest (cvdc62_raw_payload_metadata_light_id_x) as als_0
TABLESAMPLE SYSTEM (0.01 PERCENT)
WHERE
DATE(df_partition_date) <= CURRENT_DATE

Solved Solved
0 2 160
2 ACCEPTED SOLUTIONS

BigQuery does not support using TABLESAMPLE in conjunction with array scans, such as when using the UNNEST function. Specifically, TABLESAMPLE cannot be used with queries that involve UNNEST operations.

To achieve the same result without this error, you can use an alternative sampling method. One common approach is to use the RAND() function to randomly sample rows before the UNNEST operation. Here's how you can modify your query:

WITH sampled_data AS (
SELECT *
FROM `project.dataset.table`
WHERE DATE(df_partition_date) <= CURRENT_DATE
AND RAND() < 0.01 -- Adjust the sampling rate as needed
)
SELECT
MIN(als_0.light_id) AS cvdraw_payloadmetal_id_min_value,
MAX(als_0.light_id) AS cvdraw_payloadmetal_id_max_value,
COUNT(DISTINCT als_0.light_id) AS cvdraw_payloadmetadatal_id_distinct_cnt,
STRING_AGG(DISTINCT als_0.light_id, ';' LIMIT 10) AS cvdraw_payloadmetadatal_id_samp
FROM sampled_data
LEFT JOIN UNNEST(cvdc62_raw_payload_metadata_light_id_x) AS als_0;

 

 

View solution in original post

Hi @mdideles,

In addition to @ms4446 suggestion, you can use TABLESAMPLE within the CTE clause. By sampling the data at this stage, the UNNEST function will only be applied to the sampled subset, preventing any conflicts that could arise between TABLESAMPLE and UNNEST.

You may also refer to this documentation for more details on using the TABLESAMPLE and RAND functions.

I hope the above information is helpful.

View solution in original post

2 REPLIES 2

BigQuery does not support using TABLESAMPLE in conjunction with array scans, such as when using the UNNEST function. Specifically, TABLESAMPLE cannot be used with queries that involve UNNEST operations.

To achieve the same result without this error, you can use an alternative sampling method. One common approach is to use the RAND() function to randomly sample rows before the UNNEST operation. Here's how you can modify your query:

WITH sampled_data AS (
SELECT *
FROM `project.dataset.table`
WHERE DATE(df_partition_date) <= CURRENT_DATE
AND RAND() < 0.01 -- Adjust the sampling rate as needed
)
SELECT
MIN(als_0.light_id) AS cvdraw_payloadmetal_id_min_value,
MAX(als_0.light_id) AS cvdraw_payloadmetal_id_max_value,
COUNT(DISTINCT als_0.light_id) AS cvdraw_payloadmetadatal_id_distinct_cnt,
STRING_AGG(DISTINCT als_0.light_id, ';' LIMIT 10) AS cvdraw_payloadmetadatal_id_samp
FROM sampled_data
LEFT JOIN UNNEST(cvdc62_raw_payload_metadata_light_id_x) AS als_0;

 

 

Hi @mdideles,

In addition to @ms4446 suggestion, you can use TABLESAMPLE within the CTE clause. By sampling the data at this stage, the UNNEST function will only be applied to the sampled subset, preventing any conflicts that could arise between TABLESAMPLE and UNNEST.

You may also refer to this documentation for more details on using the TABLESAMPLE and RAND functions.

I hope the above information is helpful.