Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Is Offset Deterministic For BigQuery Unnesting Of Arrays?

I'm looking to create a set of BigQuery views that will unnest nested data so that we can import it into an unnested environment. I'm looking to do this with multiple different views.

I'm wondering if I do a statement similar to the following, is it always guaranteed that we'll get back the same offset value for each row?

 

SELECT UNNESTED_ARRAY_COLUMN, UNNESTED_ARRAY_COLUMN_OFFSET
FROM SCHEMA.TABLE AS TABLE
LEFT JOIN UNNEST(TABLE.ARRAY_COLUMN) AS UNNESTED_ARRAY_COLUMN WITH OFFSET AS UNNESTED_ARRAY_COLUMN_OFFSET

 

Solved Solved
2 1 2,371
1 ACCEPTED SOLUTION

Yes. If you run the query multiple times on the same data, you should get the same offset value for each row, because the offset is determined by the position of the elements in the original array, which doesn't change.

However, note that if the original data changes (i.e., the order or number of elements in the array column changes), then the offset values will also change accordingly.

Also, keep in mind that if the array is not ordered in a specific way, the order of the elements (and thus the offset values) might seem arbitrary. If you need a consistent order, you might need to apply an ORDER BY clause after the UNNEST operation.

Also note that the OFFSET value doesn't persist if you unnest another array. That is, if you're unnesting multiple arrays from the same row, the offset from the first UNNEST operation doesn't carry over to the subsequent ones.

If you need to maintain the correlation between multiple unnested arrays, you could use the UNNEST function with multiple arguments. This way, multiple arrays are unnested simultaneously, and the resulting rows have a one-to-one correspondence. For example:

 

SELECT UNNESTED_ARRAY_COLUMN_1, UNNESTED_ARRAY_COLUMN_1_OFFSET, UNNESTED_ARRAY_COLUMN_2, UNNESTED_ARRAY_COLUMN_2_OFFSET
FROM SCHEMA.TABLE AS TABLE
LEFT JOIN UNNEST(TABLE.ARRAY_COLUMN_1, TABLE.ARRAY_COLUMN_2)
AS (UNNESTED_ARRAY_COLUMN_1 WITH OFFSET AS UNNESTED_ARRAY_COLUMN_1_OFFSET,
UNNESTED_ARRAY_COLUMN_2 WITH OFFSET AS UNNESTED_ARRAY_COLUMN_2_OFFSET)

View solution in original post

1 REPLY 1

Yes. If you run the query multiple times on the same data, you should get the same offset value for each row, because the offset is determined by the position of the elements in the original array, which doesn't change.

However, note that if the original data changes (i.e., the order or number of elements in the array column changes), then the offset values will also change accordingly.

Also, keep in mind that if the array is not ordered in a specific way, the order of the elements (and thus the offset values) might seem arbitrary. If you need a consistent order, you might need to apply an ORDER BY clause after the UNNEST operation.

Also note that the OFFSET value doesn't persist if you unnest another array. That is, if you're unnesting multiple arrays from the same row, the offset from the first UNNEST operation doesn't carry over to the subsequent ones.

If you need to maintain the correlation between multiple unnested arrays, you could use the UNNEST function with multiple arguments. This way, multiple arrays are unnested simultaneously, and the resulting rows have a one-to-one correspondence. For example:

 

SELECT UNNESTED_ARRAY_COLUMN_1, UNNESTED_ARRAY_COLUMN_1_OFFSET, UNNESTED_ARRAY_COLUMN_2, UNNESTED_ARRAY_COLUMN_2_OFFSET
FROM SCHEMA.TABLE AS TABLE
LEFT JOIN UNNEST(TABLE.ARRAY_COLUMN_1, TABLE.ARRAY_COLUMN_2)
AS (UNNESTED_ARRAY_COLUMN_1 WITH OFFSET AS UNNESTED_ARRAY_COLUMN_1_OFFSET,
UNNESTED_ARRAY_COLUMN_2 WITH OFFSET AS UNNESTED_ARRAY_COLUMN_2_OFFSET)