Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to programmatically get the sizes of BigQuery tables

Hello everybody,

I need to programmatically obtain the size in bytes of the tables contained in a particular dataset. In order to get that info, I've queried the INFORMATION_SCHEMA.TABLE_STORAGE which contains the field "TOTAL_LOGICAL_BYTES".

However, I've read in the BQ documentation that the field "TOTAL_LOGICAL_BYTES" in INFORMATION_SCHEMA.TABLE_STORAGE aims at inform the users about the billable used space.

Therefore it looks like the sizes returned by the query aren't the actual (uncompressed) sizes of the queried tables (p.e. the ones that would be returned by the Linux command "ls -l"), because they might take into account any space used by the system in managing the tables (and other stuff).

By the way, the number obtained by querying INFORMATION_SCHEMA.TABLE_STORAGE is different  from (greater than) the one I can find in BigQuery Studio (GCP Console) when I click on the name of the tables contained in the dataset and then open the "Details" tab ("Storage info" - "Total logical bytes").

So, I'm wondering if the GBs number that I find in BigQuery Studio represents the actual size of the tables ("ls -l" style), and if so, what is the way to get programmatically that number.

Can someone help me?

Thank you in advance.

Marco

Solved Solved
0 2 1,883
1 ACCEPTED SOLUTION

Hi @marcousescloud,

Welcome to Google Cloud Community!

TOTAL_LOGICAL_BYTES in INFORMATION_SCHEMA.TABLE_STORAGE is the total number of uncompressed bytes in the table or materialized view,  which most of the time not kept in real time and might be delayed by a few second to few minutes.

Yes you are correct, TOTAL_LOGICAL_BYTES represent billable storage which includes overhead and system structures like metadata and not just the raw data. Typically, the total logical bytes on Storage Info under the “Details” tab of BigQuery Studio should reflect the same value on TOTAL_LOGICAL_BYTES on SQL query if you are looking at the same table. The difference might be due to other factors like the time of report was queried.

You can use TOTAL_LOGICAL_BYTES as a rough estimate of the file size, particularly when estimating costs. However, if you need the actual raw file size, a workaround is to export the data and check the file size manually. Just be aware that the file size may vary depending on the export format you choose.

I hope the above information is helpful.

View solution in original post

2 REPLIES 2

Hi @marcousescloud,

Welcome to Google Cloud Community!

TOTAL_LOGICAL_BYTES in INFORMATION_SCHEMA.TABLE_STORAGE is the total number of uncompressed bytes in the table or materialized view,  which most of the time not kept in real time and might be delayed by a few second to few minutes.

Yes you are correct, TOTAL_LOGICAL_BYTES represent billable storage which includes overhead and system structures like metadata and not just the raw data. Typically, the total logical bytes on Storage Info under the “Details” tab of BigQuery Studio should reflect the same value on TOTAL_LOGICAL_BYTES on SQL query if you are looking at the same table. The difference might be due to other factors like the time of report was queried.

You can use TOTAL_LOGICAL_BYTES as a rough estimate of the file size, particularly when estimating costs. However, if you need the actual raw file size, a workaround is to export the data and check the file size manually. Just be aware that the file size may vary depending on the export format you choose.

I hope the above information is helpful.

Hi @marckevin,

thank you so much for your reply. It is very helpful and has clarified the matter for me.