Hi,
We have recently moved our billing model from logical to physical.
We want to track storage cost in our BQ projects on daily basis.
by datasets
by tables
I tried using TABLE_STORAGE view and TABLE_STORAGE_USAGE_TIMELINE view.
In table_storage view, it shows me overall storage and billing on monthly basis.
where in usage timeline view , its not a real time and has a lag of 72 hours.
Can someone help to achieve this.
To effectively track storage costs in your BigQuery projects on a daily basis by datasets and tables, consider the following refined approaches:
Utilize the Cloud Billing API:
Employ the BigQuery Billing Export feature:
Leverage Cloud Monitoring for Operational Metrics:
Consider BigQuery Storage Views:
TABLE_STORAGE
and TABLE_STORAGE_USAGE_TIMELINE
views in BigQuery are specifically designed for analyzing storage usage. The TABLE_STORAGE
view provides monthly data, and the TABLE_STORAGE_USAGE_TIMELINE
offers daily data with a 72-hour lag. These views are crucial for understanding storage patterns and costs, despite the delay in data availability.Data Freshness and Real-Time Tracking:
Custom Queries and Automation:
By integrating these methods, you can achieve a comprehensive and detailed understanding of your BigQuery storage costs, facilitating informed decision-making and effective cost management, albeit with some limitations in data immediacy.
Thank you @ms4446 for response.
I was thinking to leverage the storage view( TABLE_STORAGE_USAGE_TIMELINE) as our need is to track on daily basis, we want to see if there is any spike on any given day.
Follow up question on this since the output is returned in MB second, how to do we calculate the cost of storage in that case.
Hi @kd2709 ,
While the TABLE_STORAGE_USAGE_TIMELINE
view in BigQuery provides historical storage usage data, it's important to note that it may not be suitable for real-time monitoring due to its latency of up to 72 hours. This means the data might not reflect the most recent storage consumption, but it can still be useful for identifying trends and spikes in storage usage over a slightly delayed timeframe.
To calculate the daily storage cost from the MB_SECOND metric, follow these steps:
Convert MB_SECOND to GB_DAY:
Calculate Daily Storage Cost:
By following these steps, you can accurately calculate the daily storage cost from the MB_SECOND metric provided by the TABLE_STORAGE_USAGE_TIMELINE
view, keeping in mind the limitations regarding data latency. Remember to always check the latest BigQuery pricing for the most accurate cost calculations.
I ran below query to get the physical usage for one of my table
SELECT
table_schema,
table_name,
billable_total_logical_usage,
billable_total_physical_usage,
billable_active_physical_usage,
usage_date
FROM
`region-EU`.INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE
WHERE
project_id = <project_name>
AND table_schema = <dataset_name>
AND table_name = <table_name>
ORDER BY
usage_date DESC;
This table has Active_physical_bytes : 47.87 TB
I got 4325568725105 MB/Sec for 27/11/2023 but when i go by the calculation you gave, I got a huge number.
Can you guide me, if i am doing something wrong here.
Hi @kd2709 ,
To accurately calculate the daily storage cost, it's advisable to average the MB_SECOND values over a longer period, such as a week or a month. This approach helps to smooth out any unusual spikes and provides a more representative average daily storage cost.
Assuming the mentioned value is an outlier, you can calculate the daily storage cost for 27/11/2023 using an average MB_SECOND value from a more typical period. For example, if the average MB_SECOND value for the week surrounding 27/11/2023 is 10,000,000 MB/Sec, the daily storage cost calculation would be as follows:
Convert MB_SECOND to GB_DAY:
Calculate Daily Storage Cost:
This calculation will provide an estimated daily storage cost based on the average MB_SECOND value. It's important to adjust the cost per GB per day based on the actual pricing for your BigQuery service.
Additionally, if you're consistently observing high MB_SECOND values for your table, it would be beneficial to investigate the reasons behind the increased storage usage. Factors such as frequent data updates, large data loads, or inefficiencies in data management could contribute to higher storage costs. Identifying and addressing these factors can lead to more efficient BigQuery usage and potential cost savings.
I am also unable to consolidate this calculation to the output we get from BQ cost export. I would assume that going from MB seconds to GB seconds, one would only need to multiply by 1024 (since 1024 MB in a GB) for instance.
PS: are your formulas intended to be formatted with \text etc? "\text{GB_DAY} = \frac{(10,000,000 \text{ MB/Sec}) \times 86,400 \text{ seconds/day}}{1024^3 \text{ MB/GB}}"? It is hard to read.
To calculate daily storage costs using the TABLE_STORAGE_USAGE_TIMELINE view in BigQuery, it's essential to accurately convert storage usage metrics from MB-seconds to GB-day and then apply the appropriate cost calculations.
Step-by-Step Conversion and Calculation
Convert MB-seconds to GB-seconds: The first step involves converting the storage usage from MB-seconds to GB-seconds. Since there are 1,024 MB in a GB, you divide the MB-seconds value by 1,024.
Convert GB-seconds to GB-day: Next, to convert the usage to a daily rate, multiply the GB-seconds value by 86,400, which is the number of seconds in a day.
Calculate the Cost per GB per Day: Storage costs are typically provided on a monthly basis. To find the daily rate, divide the monthly cost per GB by 30. For instance, if the monthly cost is $0.02 per GB, the daily cost per GB is approximately $0.00066667.
Compute the Daily Storage Cost: Finally, multiply the GB-day value by the daily cost per GB to determine the daily storage cost.
Practical Example
Consider the MB-seconds value for a specific date: 4,325,568,725,105 MB-seconds for 27/11/2023.
Convert MB-seconds to GB-seconds: GB-seconds=4,325,568,725,105 MB-seconds1,024=4,224,175,708,203.125 GB-seconds\text{GB-seconds} = \frac{4,325,568,725,105 \text{ MB-seconds}}{1,024} = 4,224,175,708,203.125 \text{ GB-seconds}GB-seconds=1,0244,325,568,725,105 MB-seconds=4,224,175,708,203.125 GB-seconds
Convert GB-seconds to GB-day: GB-day=4,224,175,708,203.125×86,400=365,510,015,519,755.44 GB-day\text{GB-day} = 4,224,175,708,203.125 \times 86,400 = 365,510,015,519,755.44 \text{ GB-day}GB-day=4,224,175,708,203.125×86,400=365,510,015,519,755.44 GB-day
Calculate the Daily Storage Cost: Given a monthly storage cost of $0.02 per GB: Cost per GB per day=0.0230=0.00066667\text{Cost per GB per day} = \frac{0.02}{30} = 0.00066667Cost per GB per day=300.02=0.00066667 Daily Storage Cost=365,510,015,519,755.44×0.00066667≈243,671,354,730.66\text{Daily Storage Cost} = 365,510,015,519,755.44 \times 0.00066667 \approx 243,671,354,730.66Daily Storage Cost=365,510,015,519,755.44×0.00066667≈243,671,354,730.66
This result appears excessively high, suggesting a potential error in the initial MB-seconds value or an anomaly in storage usage. To ensure accuracy, it's important to verify the MB-seconds data and consider averaging over a longer period, such as a week.
Averaging for Accuracy
To smooth out anomalies, calculate the average MB-seconds over a week:
-- Define the cost per GB per month
DECLARE cost_per_gb_per_month FLOAT64 DEFAULT 0.02;
-- Calculate the cost per GB per day
DECLARE cost_per_gb_per_day FLOAT64 DEFAULT cost_per_gb_per_month / 30;
WITH storage_usage AS (
SELECT
DATE(usage_date) AS date,
table_schema,
table_name,
AVG(billable_active_physical_usage) AS avg_storage_in_mb_seconds
FROM
`region-EU.INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE`
WHERE
project_id = '<project_name>'
AND table_schema = '<dataset_name>'
AND table_name = '<table_name>'
AND usage_date BETWEEN DATE_SUB('2023-11-27', INTERVAL 7 DAY) AND '2023-11-27'
GROUP BY
date, table_schema, table_name
),
storage_cost AS (
SELECT
date,
table_schema,
table_name,
avg_storage_in_mb_seconds,
-- Convert MB-second to GB-second
avg_storage_in_mb_seconds / 1024 AS storage_in_gb_seconds,
-- Convert GB-second to GB-day
(avg_storage_in_mb_seconds / 1024) * 86400 AS storage_in_gb_day,
-- Calculate daily storage cost
((avg_storage_in_mb_seconds / 1024) * 86400) * cost_per_gb_per_day AS storage_cost_usd
FROM
storage_usage
)
SELECT
date,
table_schema,
table_name,
ROUND(storage_in_gb_day, 6) AS storage_in_gb_day,
ROUND(storage_cost_usd, 6) AS storage_cost_usd
FROM
storage_cost
ORDER BY
date DESC;
Hi @ms4446 ,
Thanks for the above answers. It helped a lot in understanding the ways to calculate the storage cost.
I can not get the access on Table_Storage and Table_Storage_usage_Timeline as both of the table are available at region level not available at dataset level. Can you please suggest if there is any other way to get the cost with the help of Information_Schema tables.