I have a large table (~150TB) and a 100 GB reservation for BI Engine. The table was constantly using the BI Engine but suddenly stop using it.
I've observed this behaviour a couple of time in the last few days. Whenever this happens, all the queries on the table scan complete table instead of reading data from the BI Engine, which increase bytes_scanned as well as slotms.
My current understanding is BI Engine keeps table data in memory for 24 hours if it's not queried. I am not seeing any changes in the query patterns so I don't understand why there would be such a large drop in BI Engine usage. I've also tried to add tables as `preferred tables` for the BI reservation.
Any help on how BI Engine decides to remove table blocks from memory would be useful.
The BI Engine in Google Cloud BigQuery determines which table blocks to remove from memory based on several factors, including:
Recency of Access: Table blocks that haven't been accessed recently are more likely to be evicted from memory due to BI Engine's LRU (Least Recently Used) eviction policy.
Memory Usage: If the BI Engine is nearing its memory capacity, it may evict table blocks to free up resources, regardless of the size of the blocks.
Preferred Tables: While you can set entire tables as "preferred" to give them priority in the BI Engine reservation, this doesn't guarantee that all blocks from these tables will always be kept in memory. It merely gives the table preference over other tables in the reservation.
Given the size of your table, it's possible that not all blocks can be kept in memory, especially if the table is significantly larger than the BI Engine reservation. This might lead to some blocks being evicted when new blocks are loaded, especially if those new blocks are accessed more frequently.
To enhance the performance and memory management of BI Engine, consider the following:
Increase BI Engine Reservation: Allocating more memory to BI Engine can reduce the likelihood of it evicting table blocks.
Use Preferred Tables: By setting tables as "preferred," you can give them priority in memory, but remember that this doesn't guarantee that all their blocks will stay in memory.
Partition Your Table: While partitioning doesn't directly influence how BI Engine manages memory for blocks, it can make querying large datasets more efficient. If specific partitions are queried frequently, their blocks might be more likely to remain in memory.
Leverage Materialized Views: Materialized views can speed up queries as they are pre-computed copies of the data. However, they are separate from the original table and have their own storage.
For monitoring and optimization:
BigQuery Monitoring Tools: Use the monitoring tools available within the Google Cloud Console to check BI Engine's memory usage and other metrics.
BigQuery Job History: Review this to identify any long-running queries that might be consuming significant resources. Optimizing or canceling such queries can help.
BigQuery Query Advisor: This tool provides recommendations on improving query performance, focusing on optimizing the SQL queries themselves.
Table and Job Information: Detailed information about tables and jobs can be found in the BigQuery Console, which can provide insights into data distribution and query patterns.
Hey @ms4446 , thanks for a quick and detailed response. I understand the reasons BI Engine might decide to remove blocks from the engine but I still don't understand why there is such a sudden drop in the overall usage of BI Engine, especially when capacity is available (as shown in picture above).
Some more context about the table that is showing drop in BI Engine usage.
- It's partitioned on an integer column
- It's also clustered by 2 columns that are most commonly used for filtering
- There is a materialised views that's being used in the queries.
Given the above information, what could be a good reason for BI Engine to evict blocks from the table?
Based on the additional information you provided, here are some potential reasons why the BI Engine might be evicting blocks from the table:
Materialized View Usage: If the materialized view associated with the table is accessed frequently, it might reduce the need for BI Engine to access certain blocks of the base table. However, this doesn't necessarily lead to eviction unless there's memory pressure.
Query Patterns: If the queries are primarily targeting specific partitions or subsets of the table, the BI Engine might prioritize those blocks in memory, leading to potential eviction of less frequently accessed blocks.
Memory Pressure: If the BI Engine is experiencing memory pressure due to the requirements of multiple queries, it might evict certain blocks to accommodate new ones.
It's also possible that a combination of these factors is influencing the BI Engine's behavior.
To further investigate the issue:
Monitor BI Engine Memory: Use the BigQuery Monitoring tool to keep an eye on BI Engine's memory usage. Look for any spikes or patterns that might correlate with the eviction.
Review Query Patterns: Check the BigQuery Job History to see the nature of the queries being executed on the table. This can provide insights into which blocks are being accessed frequently.
Optimize Queries: The BigQuery Query Advisor can offer recommendations to enhance query performance, which might indirectly influence BI Engine's memory management.