Hi,
We're using BigQuery Editions and have a few reservations set up. They each have a limit they can autoscale to, and no commitments.
What I want to do is track how much we are spending, per reservation. Any ideas on how to do that?
I had tried the Billing reports, but the best I can get is seeing the total cost by SKU (BigQuery Enterprise Edition for Europe (multi-region)), which is the total. I don't see any way to group by reservation, e.g. by using a label. That would have been the ideal solution.
Anyone found a workaround using audit logs, information schema tables, etc?
Thanks,
Andrew
There are several methods to monitor your expenses per reservation on Google Cloud.
Utilizing Billing reports
Although Billing reports currently lack the feature to group by reservation, you can still derive necessary details by employing a mix of filters and custom dimensions. For instance, the subsequent filters might be used:
Service: BigQuery SKU: BigQuery Enterprise Edition for Europe (multi-region) Dimension: Reservation ID The output will be a report presenting the cost associated with each reservation.
Leveraging audit logs
Audit logs offer another route to keep track of your spending per reservation. These logs hold a history of all BigQuery API calls made, encompassing the used reservation ID. This data can be manipulated to generate a custom report demonstrating the cost per reservation.
Applying the INFORMATION_SCHEMA views
INFORMATION_SCHEMA views provide details on all executed BigQuery jobs, which also include the reservation ID for each job. This data can be utilized to frame a custom query reflecting the cost per reservation.
Here's a sample query you might consider:
SELECT
reservation_id,
SUM(total_bytes_processed) AS total_bytes_processed,
SUM(total_slot_ms) AS total_slot_ms
FROM
INFORMATION_SCHEMA.JOBS
WHERE
service = 'BigQuery'
AND sdk_name = 'google-cloud-bigquery'
GROUP BY
reservation_id
This query sums up the total bytes processed and the total slot milliseconds consumed for each reservation. The resultant data can further be used to determine the cost associated with each reservation.
@ms4446 wrote:Utilizing Billing reports
Although Billing reports currently lack the feature to group by reservation, you can still derive necessary details by employing a mix of filters and custom dimensions. For instance, the subsequent filters might be used:
Service: BigQuery SKU: BigQuery Enterprise Edition for Europe (multi-region) Dimension: Reservation ID The output will be a report presenting the cost associated with each reservation.
This looks ideal, but what do you mean by custom dimension here? How do I add that to the report? Or alternatively, how do I use that as part of a query in the BigQuery export?
Sorry for the confusion. You can monitor your expenditure for each reservation by leveraging the INFORMATION_SCHEMA views. These views offer insights into all BigQuery jobs that have been executed, inclusive of the reservation ID for each job. This data can be harnessed to formulate a custom query that mirrors the cost per reservation.
Here's a sample query you might consider:
SELECT
reservation_id,
SUM(total_bytes_processed) AS total_bytes_processed,
SUM(total_slot_ms) AS total_slot_ms
FROM
INFORMATION_SCHEMA.JOBS
WHERE
service = 'BigQuery'
AND sdk_name = 'google-cloud-bigquery'
GROUP BY
reservation_id
This query aggregates the total bytes processed and the total slot milliseconds used for each reservation. The resulting data can subsequently be used to calculate the cost associated with each reservation.
To incorporate this query into your report, follow these steps:
Your report will be generated and you'll be able to view the results.
Please bear in mind the following:
Regarding the above advice, this query against the billing export does not return the SKU you mention:
Secondly, executing this query from the data access logs also returns nothing:
On your billing export query, change `service.description = 'BigQuery'` to `service.description = 'BigQuery Reservation API'` and you'll see the SKU.
One way to get the SKU for the reservation is to use the INFORMATION_SCHEMA.JOBS view. This view contains information about all of the jobs that have been run in BigQuery, including the reservation ID.
However, as you mentioned, the reservation ID is only available on the project version of the view. This means that if you want to track your spending per reservation across multiple projects, you will need to create a snapshot of the INFORMATION_SCHEMA.JOBS view for each project.
This is a bit of a hassle, but it is the only way to get the information that you need.
Here is a query that you can use to extract the SKU for the reservation from the INFORMATION_SCHEMA.JOBS view:
SELECT
project_id,
job_id,
reservation_id,
REGEXP_EXTRACT(
TO_JSON_STRING(job_statistics.reservation_usage),
r'"key":"name","value":"([a-zA-Z0-9_-]*)"'
) AS reservation_name,
SUM(slot_ms) AS slot_ms
FROM
INFORMATION_SCHEMA.JOBS
WHERE
reservation_id IS NOT NULL
GROUP BY
project_id,
job_id,
reservation_id
ORDER BY
project_id,
job_id
This query will return a table with the following columns:
You can then use this table to calculate the cost per reservation.
This helps, but I don;t believe its accurate:
To measure cost per reservation you need:
- Costs
- Reservations
In the above query I can get the relative slot usage within a reservation. Simple example:
Reservation 1 , Job A is 20%, Job B is 80% of slot usage
Reservation 2 , Job C is 40%, Job D is 60% of slot usage
The cost is not sliced by reservation. My total reservations may cost EUR 200, but I do not know the cost of Reservations 1 or 2
Determining the precise cost for each reservation isn't solely based on the number of slots utilized. It also takes into account the time of day and the geographical location where the reservation is active.
To ascertain the exact cost per reservation, you'll need to employ the BigQuery API to acquire the usage details and costs associated with each reservation. For instance, the API call below retrieves the current usage and costs for the reservation identified by my-reservation-id:
projects().reservations().usage().get(reservationId="my-reservation-id") The returned object will contain these properties:
slot_millis_used: This represents the total slot milliseconds consumed by the reservation. bytes_processed: This indicates the total bytes processed by the reservation. cost: This shows the total cost of the reservation. With this data, you can compute the cost per reservation. The subsequent query, for example, calculates the cost per reservation for all reservations in your project:
SELECT reservation_id, SUM(slot_millis_used) AS total_slot_millis_used, SUM(bytes_processed) AS total_bytes_processed, SUM(cost) AS total_cost, (SUM(cost) / SUM(slot_millis_used)) AS cost_per_slot_millisecond FROM my-project-id.INFORMATION_SCHEMA.RESERVATIONS
GROUP BY reservation_id The result of this query will be a table with the following columns:
reservation_id: The reservation's ID. total_slot_millis_used: The total slot milliseconds consumed by the reservation. total_bytes_processed: The total bytes processed by the reservation. total_cost: The overall cost of the reservation. cost_per_slot_millisecond: The cost per slot millisecond for the reservation. This data can be used to monitor your BigQuery expenses and pinpoint potential areas for cost reduction.
Thanks @ms4446 for the replies! Can you link to the API documentation? I can't find it.
You can find the documentation for Google Cloud's BigQuery Reservations on the following link: Introduction to reservations | BigQuery | Google Cloud
This link provides an overview of reservations, benefits, assignments, and commitments. It also explains the reservation model in BigQuery. Please note that while it provides information on managing reservations, it does not directly provide information on tracking costs per reservation.
The link to the Google Cloud's BigQuery Reservation API Client Libraries can be found here: BigQuery Reservation API Client Libraries | Google Cloud
This page provides information on how to use the BigQuery Reservation API with various client libraries including Python, Ruby, Go, etc. It also provides information on how to set up authentication for the BigQuery Reservation API.
Hello,
I think the only correct way is the ability to put label to every reservation to be able to see the split of costs in billing data.
I dont see the option now, dont you know whether this is on roadmap?
Thanks, Martin
Yes, please. There is currently no easy way to know how much each reservation costs, or better yet, the project in each reservation, without dancing with multiple scripts. Does anyone have a piece of working end-to-end code to do this kind of calculation?
This is what I'm currently using. The total costs come out ~18% too low though, when compared to the billing report 😞 But it's simple and gives an indication.
SELECT reservation_id, (SUM(total_slot_ms)/1000/60/60) * 0.066 AS estimated_costs FROM `region-eu`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION WHERE reservation_id IS NOT NULL AND reservation_id != 'default-pipeline' GROUP BY 1