Slow performance running a PIVOT query for a speci... - Page 2

jesusmgarcia · 01-17-2024 05:59 AM

Hello;

I am experiencing a performance problem by running a PIVOT query within an specific Google BigQuery project, while in other project it works fine. I've tried different things, and finally I've reduced the case to this simplified example.

I have two Google BigQuery projects, let's say Project A and Project B. In both, I have a dataset with the very same table, and both tables have the same data.

Then I have to run a query like this one:

SELECT
  id,
  MAX(`key_1`) `key_1`,
  MAX(`key_2`) `key_2`,
  MAX(`key_3`) `key_3`
  …,
  MAX(`key_n`) `key_N`
FROM dataset.table PIVOT (MAX(pivot_column) FOR key_column IN ("key_1", "key_2", "key_3", ... "key_N"))
GROUP BY id

And the number of keys (N) is quite high, let's say 1000 keys.

When I run this:

bq query --project_id=project-a --use_legacy_sql=false --format=prettyjson --dry_run=true QUERY

It takes several seconds to generate the result.

However, when I run the same bq query command changing project-a by project-b, I get this error message:

BigQuery error in query operation: Dry run query timed out. Some common causes for this are accessing too many tables, or too many partitions for partitioned tables, or too many files matched for external tables.

If I reduce the keys from 1000 to a lower value, let's say 100, then it works for project-b, but it takes a lot of time comparing to the time it takes for project-a.

I've revised project quotas, project settings, dataset settings, and I haven't found anything relevant. Any insight about what might be happening here?

Thanks!

Slow performance running a PIVOT query for a specific BigQuery project