Solved: Re: Cost in BigQuery

wdrdg

I want to check if the table is empty and use

select 1 from <table_name> limit 1;

This will give me Records Read with the total num of rows in the table, 10000 lets say. The Bytes Processed and Bytes Billed is 0.

When I use

select 1 from <table_name> WHERE DT='<partition_date>' limit 1;

This will give me Records Read with the num of rows in the partition, 100 lets say. The Bytes Processed and Bytes Billed is not 0.

I do not know which one is better and cost less. Could you help to compare these cases?

MarvinLlamas

Hi @wdrdg,

Welcome to the Google Cloud Community!

To determine whether a table is empty, the query SELECT 1 FROM <table_name> LIMIT 1; can be efficient and potentially cost-free, as BigQuery often optimizes it to perform a metadata read (avoiding a full data scan). The INFORMATION_SCHEMA.TABLES approach provides an even more explicit method for checking the existence of a table.

To determine whether a specific partition exists and is likely not empty, the query SELECT 1 FROM <table_name> WHERE DT = '<partition_date>' LIMIT 1; can be used. However, this query incurs a cost because it necessitates reading data from the partition. A potentially more efficient approach is to query INFORMATION_SCHEMA.PARTITIONS to check for the existence of the partition. This minimizes data scanned and can keep your BigQuery bills down.

The SELECT 1 FROM <table_name> LIMIT 1; query is preferable when determining whether the entire table is empty, especially if Bytes Processed is zero. However, if the objective is to verify the presence of data within a specific partition, checking INFORMATION_SCHEMA.PARTITIONS is likely more efficient than querying the partition directly (unless you need absolute certainty of non-emptiness and are willing to pay the cost).

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

View solution in original post

MarvinLlamas