Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Physical storage cost on daily basis in BigQuery

Hi,

We have recently moved our billing model from logical to physical.

We want to track storage cost in our BQ projects on daily basis.

by datasets
by tables

I tried using TABLE_STORAGE view and TABLE_STORAGE_USAGE_TIMELINE view.

In table_storage view, it shows me overall storage and billing on monthly basis.
where in usage timeline view , its not a real time and has a lag of 72 hours.

Can someone help to achieve this.

0 8 2,978
8 REPLIES 8

To effectively track storage costs in your BigQuery projects on a daily basis by datasets and tables, consider the following refined approaches:

  1. Utilize the Cloud Billing API:

    • The Cloud Billing API offers access to detailed billing data, including storage costs for BigQuery. While it provides comprehensive cost analysis, be aware that the data might not be real-time and could have a delay. This tool is ideal for analyzing trends and obtaining detailed insights into your costs over time.
  2. Employ the BigQuery Billing Export feature:

    • BigQuery Billing Export allows you to export detailed billing data to a BigQuery dataset. This includes granular information on storage costs. However, similar to the Cloud Billing API, the exported data may not be available in real-time and could be subject to delays. It's excellent for scheduled daily analysis and historical cost tracking.
  3. Leverage Cloud Monitoring for Operational Metrics:

    • While Cloud Monitoring is adept at tracking operational metrics, such as query performance and job execution times, it's less straightforward for billing data. You can create custom metrics based on billing data, but this requires additional setup, possibly integrating with the Cloud Billing API or BigQuery Billing Export data.
  4. Consider BigQuery Storage Views:

    • The TABLE_STORAGE and TABLE_STORAGE_USAGE_TIMELINE views in BigQuery are specifically designed for analyzing storage usage. The TABLE_STORAGE view provides monthly data, and the TABLE_STORAGE_USAGE_TIMELINE offers daily data with a 72-hour lag. These views are crucial for understanding storage patterns and costs, despite the delay in data availability.
  5. Data Freshness and Real-Time Tracking:

    • It's important to note that real-time tracking of storage costs might not be feasible due to the inherent delay in billing and usage data across these methods. Adjust expectations accordingly and focus on near-real-time analysis with the most recent available data.
  6. Custom Queries and Automation:

    • To maximize the utility of these tools, consider writing custom queries and setting up automation, such as scheduled scripts, to regularly fetch, store, and analyze this data for ongoing cost monitoring and optimization.

By integrating these methods, you can achieve a comprehensive and detailed understanding of your BigQuery storage costs, facilitating informed decision-making and effective cost management, albeit with some limitations in data immediacy.

Thank you @ms4446  for response.

I was thinking to leverage the storage view( TABLE_STORAGE_USAGE_TIMELINE) as our need is to track on daily basis, we want to see if there is any spike on any given day. 

Follow up question on this since the output is returned in MB second, how to do we calculate the cost of storage in that case.

Hi @kd2709 ,

While the TABLE_STORAGE_USAGE_TIMELINE view in BigQuery provides historical storage usage data, it's important to note that it may not be suitable for real-time monitoring due to its latency of up to 72 hours. This means the data might not reflect the most recent storage consumption, but it can still be useful for identifying trends and spikes in storage usage over a slightly delayed timeframe.

To calculate the daily storage cost from the MB_SECOND metric, follow these steps:

  1. Convert MB_SECOND to GB_DAY:

    • Since BigQuery's storage pricing is based on GB per month, you need to convert the MB_SECOND values to GB_DAY. This involves two steps:
      • Convert MB_SECOND to GB_SECOND by dividing the MB_SECOND value by 10242 (since there are 1,024 MB in a GB).
      • Then, convert GB_SECOND to GB_DAY by multiplying the GB_SECOND value by 86,400 (the number of seconds in a day).
  2. Calculate Daily Storage Cost:

    • Once you have the GB_DAY value, calculate the daily storage cost using the current BigQuery storage pricing for your region and storage type. The formula is:
      \text{Daily Storage Cost} = \text{GB_DAY} \times \text{Cost per GB per day}
    • Note that the cost per GB per day should be derived from the monthly cost. For example, if the monthly cost is $0.02 per GB (for active storage), divide this by 30 to get the daily rate. Adjust the rate accordingly for long-term storage, which typically has a lower cost.

By following these steps, you can accurately calculate the daily storage cost from the MB_SECOND metric provided by the TABLE_STORAGE_USAGE_TIMELINE view, keeping in mind the limitations regarding data latency. Remember to always check the latest BigQuery pricing for the most accurate cost calculations.

@ms4446 

I ran below query to get the physical usage for one of my table

SELECT
table_schema,
table_name,
billable_total_logical_usage,
billable_total_physical_usage,
billable_active_physical_usage,
usage_date
FROM
`region-EU`.INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE
WHERE
project_id = <project_name>
AND table_schema = <dataset_name>
AND table_name = <table_name>
ORDER BY
usage_date DESC;

This table has Active_physical_bytes : 47.87 TB
I got 4325568725105 MB/Sec for 27/11/2023 but when i go by the calculation you gave, I got a huge number.
Can you guide me, if i am doing something wrong here.

Hi @kd2709 ,

To accurately calculate the daily storage cost, it's advisable to average the MB_SECOND values over a longer period, such as a week or a month. This approach helps to smooth out any unusual spikes and provides a more representative average daily storage cost.

Assuming the mentioned value is an outlier, you can calculate the daily storage cost for 27/11/2023 using an average MB_SECOND value from a more typical period. For example, if the average MB_SECOND value for the week surrounding 27/11/2023 is 10,000,000 MB/Sec, the daily storage cost calculation would be as follows:

  1. Convert MB_SECOND to GB_DAY:

    • First, convert MB_SECOND to GB_DAY by multiplying by 86,400 (the number of seconds in a day) and then dividing by 10243 (to convert MB to GB).
    \text{GB_DAY} = \frac{(10,000,000 \text{ MB/Sec}) \times 86,400 \text{ seconds/day}}{1024^3 \text{ MB/GB}}
  2. Calculate Daily Storage Cost:

    • Multiply GB_DAY by the cost per GB per day. Assuming a cost of $0.02 per GB per day, the formula would be:
    \text{Daily Storage Cost} = \text{GB_DAY} \times \$0.02

This calculation will provide an estimated daily storage cost based on the average MB_SECOND value. It's important to adjust the cost per GB per day based on the actual pricing for your BigQuery service.

Additionally, if you're consistently observing high MB_SECOND values for your table, it would be beneficial to investigate the reasons behind the increased storage usage. Factors such as frequent data updates, large data loads, or inefficiencies in data management could contribute to higher storage costs. Identifying and addressing these factors can lead to more efficient BigQuery usage and potential cost savings.

I am also unable to consolidate this calculation to the output we get from BQ cost export. I would assume that going from MB seconds to GB seconds, one would only need to multiply by 1024 (since 1024 MB in a GB) for instance. 

PS: are your formulas intended to be formatted with \text etc? "\text{GB_DAY} = \frac{(10,000,000 \text{ MB/Sec}) \times 86,400 \text{ seconds/day}}{1024^3 \text{ MB/GB}}"? It is hard to read.

To calculate daily storage costs using the TABLE_STORAGE_USAGE_TIMELINE view in BigQuery, it's essential to accurately convert storage usage metrics from MB-seconds to GB-day and then apply the appropriate cost calculations.

Step-by-Step Conversion and Calculation

  1. Convert MB-seconds to GB-seconds: The first step involves converting the storage usage from MB-seconds to GB-seconds. Since there are 1,024 MB in a GB, you divide the MB-seconds value by 1,024.

  2. Convert GB-seconds to GB-day: Next, to convert the usage to a daily rate, multiply the GB-seconds value by 86,400, which is the number of seconds in a day.

  3. Calculate the Cost per GB per Day: Storage costs are typically provided on a monthly basis. To find the daily rate, divide the monthly cost per GB by 30. For instance, if the monthly cost is $0.02 per GB, the daily cost per GB is approximately $0.00066667.

  4. Compute the Daily Storage Cost: Finally, multiply the GB-day value by the daily cost per GB to determine the daily storage cost.

Practical Example

Consider the MB-seconds value for a specific date: 4,325,568,725,105 MB-seconds for 27/11/2023.

  1. Convert MB-seconds to GB-seconds: GB-seconds=4,325,568,725,105 MB-seconds1,024=4,224,175,708,203.125 GB-seconds\text{GB-seconds} = \frac{4,325,568,725,105 \text{ MB-seconds}}{1,024} = 4,224,175,708,203.125 \text{ GB-seconds}

  2. Convert GB-seconds to GB-day: GB-day=4,224,175,708,203.125×86,400=365,510,015,519,755.44 GB-day\text{GB-day} = 4,224,175,708,203.125 \times 86,400 = 365,510,015,519,755.44 \text{ GB-day}

  3. Calculate the Daily Storage Cost: Given a monthly storage cost of $0.02 per GB: Cost per GB per day=0.0230=0.00066667\text{Cost per GB per day} = \frac{0.02}{30} = 0.00066667 Daily Storage Cost=365,510,015,519,755.44×0.00066667≈243,671,354,730.66\text{Daily Storage Cost} = 365,510,015,519,755.44 \times 0.00066667 \approx 243,671,354,730.66

This result appears excessively high, suggesting a potential error in the initial MB-seconds value or an anomaly in storage usage. To ensure accuracy, it's important to verify the MB-seconds data and consider averaging over a longer period, such as a week.

Averaging for Accuracy

To smooth out anomalies, calculate the average MB-seconds over a week:

-- Define the cost per GB per month
DECLARE cost_per_gb_per_month FLOAT64 DEFAULT 0.02;

-- Calculate the cost per GB per day
DECLARE cost_per_gb_per_day FLOAT64 DEFAULT cost_per_gb_per_month / 30;

WITH storage_usage AS (
  SELECT
    DATE(usage_date) AS date,
    table_schema,
    table_name,
    AVG(billable_active_physical_usage) AS avg_storage_in_mb_seconds
  FROM
    `region-EU.INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE`
  WHERE
    project_id = '<project_name>'
    AND table_schema = '<dataset_name>'
    AND table_name = '<table_name>'
    AND usage_date BETWEEN DATE_SUB('2023-11-27', INTERVAL 7 DAY) AND '2023-11-27'
  GROUP BY
    date, table_schema, table_name
),
storage_cost AS (
  SELECT
    date,
    table_schema,
    table_name,
    avg_storage_in_mb_seconds,
    -- Convert MB-second to GB-second
    avg_storage_in_mb_seconds / 1024 AS storage_in_gb_seconds,
    -- Convert GB-second to GB-day
    (avg_storage_in_mb_seconds / 1024) * 86400 AS storage_in_gb_day,
    -- Calculate daily storage cost
    ((avg_storage_in_mb_seconds / 1024) * 86400) * cost_per_gb_per_day AS storage_cost_usd
  FROM
    storage_usage
)
SELECT
  date,
  table_schema,
  table_name,
  ROUND(storage_in_gb_day, 6) AS storage_in_gb_day,
  ROUND(storage_cost_usd, 6) AS storage_cost_usd
FROM
  storage_cost
ORDER BY
  date DESC;

 

Hi @ms4446 ,

Thanks for the above answers. It helped a lot in understanding the ways to calculate the storage cost.

I can not get the access on Table_Storage and Table_Storage_usage_Timeline as both of the table are available at region level not available at dataset level. Can you please suggest if there is any other way to get the cost with the help of Information_Schema tables.