Hi, I am currently extracting my GA4 data from bigquery using bigquery APIs.
As the GA4 daily hits goes over 2millions on my account, I had to slice it down to 10k rows per each API request to process the data.
the problem occurs here when I used the order by clause which is trying to sort over 2million records.
the error statements is like
error occured: 400 Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 100% of limit.
Top memory consumer(s):
ORDER BY operations: 99%
other/unattributed: 1%
I am using order by clause because the bigquery API will return the different 10k rows every each request without it.
so my question is..
1. would this resolve the memory issue?(https://cloud.google.com/bigquery/docs/reservations-commitments)
2. Is there any other better way that I could get 10k record in order? I am currently doing
Hi @hyunchul,
Regarding the Error 400: Resource exceeded, This returns when your job uses too many resources.
Here are some workarounds that you can do:
I hope the above information is helpful.
Hi @hyunchul here are a few things you could try:
BigQuery Slots and Reservations: Upgrading to a dedicated slot commitment (like the link you shared suggests) might boost performance by giving your queries more resources. Just keep in mind that it might still not solve memory limitations completely if the dataset is too large to sort all at once.
Partitioned Queries: Instead of sorting the entire dataset, maybe try partitioning it by a key column, like DATE. You could then pull 10k records from each partition one at a time. This way, each query has a lighter load, and you might avoid sorting the full dataset.
Paginate Instead of Using OFFSET: OFFSET can be pretty slow on big datasets because BigQuery has to go through all previous rows each time. A better way might be to paginate by storing the last EVENT_TIMESTAMP and USER_PSEUDO_ID of each batch, then using those as your starting point for the next one (like WHERE EVENT_TIMESTAMP > last_timestamp).
Consider an ETL Tool: If memory issues keep popping up, you might want to look into using a third-party ETL tool that’s optimized for large datasets, like Windsor.ai. Some of these tools have BigQuery connectors and can handle big data extractions with less hassle.
Hope this can be helpful!