Re: Getting cost value mismatch while using UNNEST...

techGeek · 05-29-2024 10:57 PM

I am using BigQuery to get GCP usage data using queries. I have a single query to get all details from GCP exported Bill data to BigQuery.

select sum(cost) as cost, sum(credits.amount) as credit FROM `TABLENAME`

LEFT JOIN UNNEST(credits) AS credits

WHERE invoice.month="202404"

AND

select sum(cost) as cost FROM `TABLENAME`

WHERE invoice.month="202404"

in both queries, I am getting different cost values. Why I am getting like this?

Above is a sample query.

Below one is my actual query. Here I am aggregating hourly data as daily data by filtering zero cost and credit amounts.

in this scenario, I am getting multiple duplicate rows in result.

It's Bigquery cost export data.

SELECT

billing_account_id,

service.id AS service_id,

service.description AS service_des,

sku.id AS sku_id,

sku.description AS sku_des,

FORMAT_DATETIME('%Y-%m-%d', usage_start_time) AS usage_date,

location.location AS location,

location.region AS region,

location.country AS country,

location.zone AS zone,

invoice.month AS invoice_mon,

currency,

AVG(currency_conversion_rate) as currency_conversion_rate,

cost_type,

resource.name AS resource_name,

resource.global_name AS res_global_name,

SUM(usage.amount_in_pricing_units) AS usage_amount_pricing_units,

usage.pricing_unit AS pricing_unit,

SUM(cost) AS cost,

SUM(cost_at_list) AS cost_at_list,

transaction_type,

seller_name,

adjustment_info.id AS adjustment_info_id,

adjustment_info.description AS adjustment_des,

adjustment_info.type AS adjustment_type,

adjustment_info.mode AS adjustment_mode,

SUM(price.effective_price) AS effective_price,

SUM(price.pricing_unit_quantity) AS pricing_unit_quantity,

project.id AS project_id,

project.number AS project_number,

project.name AS project_name,

SUM(IFNULL(credits.amount, 0)) AS credit_amount,

credits.type AS credit_type,

credits.id AS credit_id,

credits.name as credit_name,

credits.full_name AS credit_full_name,

TO_JSON_STRING(labels) AS labels,

TO_JSON_STRING(system_labels) AS system_labels

FROM `TABLENAME`

LEFT JOIN UNNEST(credits) AS credits

WHERE (cost!=0 OR credits.amount!=0) and invoice.month="202404"

GROUP BY

billing_account_id,

service_id,

service_des,

sku_id,

sku_des,

usage_date,

location,

region,

country,

zone,

invoice_mon,

currency,

cost_type,

resource_name,

res_global_name,

usage.pricing_unit,

transaction_type,

seller_name,

adjustment_info_id,

adjustment_des,

adjustment_type,

adjustment_mode,

project_id,

project_number,

project_name,

credit_type,

credit_full_name,

credit_id,

credit_name,

labels,

system_labels

HAVING

(cost!=0 OR credit_amount!=0)

ORDER BY

usage_date

This is my actual query, What changes I have to do to get the results without duplicates.

ms4446

The reason you're seeing different cost values and duplicate rows is due to how UNNEST(credits) is used in conjunction with your aggregation and joins.

When you use LEFT JOIN UNNEST(credits), each row in your original table can be duplicated multiple times if there are multiple credits associated with it. Since you're summing the cost column after the join, this leads to inflated cost values.

The GROUP BY clause in your query is likely grouping rows that have different credit details (e.g., different credit_id, credit_type), but the same values for the other columns you're grouping on. This results in multiple rows per group, where only the credit details vary.

Here’s an updated version of your query that aggregate the credits before joining:

Aggregate the credits data into a single amount per row in a subquery or Common Table Expression (CTE).
Then, join this aggregated data with your main table.

WITH aggregated_credits AS (
  SELECT
    billing_account_id,
    service.id AS service_id,
    sku.id AS sku_id,
    FORMAT_DATETIME('%Y-%m-%d', usage_start_time) AS usage_date,
    SUM(IFNULL(credits.amount, 0)) AS credit_amount
  FROM `TABLENAME`
  LEFT JOIN UNNEST(credits) AS credits
  WHERE (cost != 0 OR credits.amount != 0) AND invoice.month = "202404"
  GROUP BY
    billing_account_id,
    service_id,
    sku_id,
    usage_date
)

SELECT
  main.billing_account_id,
  main.service.id AS service_id,
  main.service.description AS service_des,
  main.sku.id AS sku_id,
  main.sku.description AS sku_des,
  FORMAT_DATETIME('%Y-%m-%d', main.usage_start_time) AS usage_date,
  main.location.location AS location,
  main.location.region AS region,
  main.location.country AS country,
  main.location.zone AS zone,
  main.invoice.month AS invoice_mon,
  main.currency,
  AVG(main.currency_conversion_rate) as currency_conversion_rate,
  main.cost_type,
  main.resource.name AS resource_name,
  main.resource.global_name AS res_global_name,
  SUM(main.usage.amount_in_pricing_units) AS usage_amount_pricing_units,
  main.usage.pricing_unit AS pricing_unit,
  SUM(main.cost) AS cost,
  SUM(main.cost_at_list) AS cost_at_list,
  main.transaction_type,
  main.seller_name,
  main.adjustment_info.id AS adjustment_info_id,
  main.adjustment_info.description AS adjustment_des,
  main.adjustment_info.type AS adjustment_type,
  main.adjustment_info.mode AS adjustment_mode,
  SUM(main.price.effective_price) AS effective_price,
  SUM(main.price.pricing_unit_quantity) AS pricing_unit_quantity,
  main.project.id AS project_id,
  main.project.number AS project_number,
  main.project.name AS project_name,
  IFNULL(credits.credit_amount, 0) AS credit_amount,
  TO_JSON_STRING(main.labels) AS labels,
  TO_JSON_STRING(main.system_labels) AS system_labels
FROM `TABLENAME` AS main
LEFT JOIN aggregated_credits AS credits
  ON main.billing_account_id = credits.billing_account_id 
  AND main.service.id = credits.service_id
  AND main.sku.id = credits.sku_id
  AND FORMAT_DATETIME('%Y-%m-%d', main.usage_start_time) = credits.usage_date
WHERE (main.cost != 0 OR IFNULL(credits.credit_amount, 0) != 0) 
  AND main.invoice.month = "202404"
GROUP BY
  main.billing_account_id,
  main.service_id,
  main.service_des,
  main.sku_id,
  main.sku_des,
  usage_date,
  location,
  region,
  country,
  zone,
  invoice_mon,
  currency,
  cost_type,
  resource_name,
  res_global_name,
  pricing_unit,
  transaction_type,
  seller_name,
  adjustment_info_id,
  adjustment_des,
  adjustment_type,
  adjustment_mode,
  project_id,
  project_number,
  project_name,
  labels,
  system_labels,
  credit_amount
HAVING (SUM(main.cost) != 0 OR IFNULL(credits.credit_amount, 0) != 0)
ORDER BY usage_date;

Accurate Costs: You'll sum the cost without duplicates, as there's only one cost value per row in the aggregated_credits subquery.
No Duplicate Rows: Since the credit_amount is already aggregated, you won't have multiple rows with different credit details for the same group.

Additional Considerations:

Performance: If you have a very large table, aggregating the credits before joining can be significantly more efficient than unnesting and joining in one step.
Credit Details: If you need to preserve individual credit details (e.g., credit_id, credit_type) for reporting purposes, you can include them in the aggregation and GROUP BY clause in the aggregated_credits subquery.

Key Improvements:

techGeek

Thanks for the response.
I tried the above query. with this I am getting correct cost value but credit amount is higher. Actual credit amount (-9973) but I got (-24660).
Also In the above query credits.id, credits.type, credits.name, credits.full_name are missing. I need those details to map the resource with applied credit details. so in this query what's the change required to achieve that?

ms4446

To address the issue of inflated credit amounts and include the necessary credit details (such as credits.id, credits.type, credits.name, and credits.full_name), you can modify the aggregation and join strategy to ensure correct handling of the credits data.

Here's an updated approach to aggregate the credits correctly and include the necessary credit details:

Aggregate the credits data into a single amount per row in a subquery or CTE, while also collecting the relevant credit details.
Join this aggregated data with your main table.

Here's the updated SQL query:

WITH aggregated_credits AS (
  SELECT
    billing_account_id,
    service.id AS service_id,
    sku.id AS sku_id,
    FORMAT_DATETIME('%Y-%m-%d', usage_start_time) AS usage_date,
    ARRAY_AGG(STRUCT(
      credits.amount AS credit_amount,
      credits.id AS credit_id,
      credits.type AS credit_type,
      credits.name AS credit_name,
      credits.full_name AS credit_full_name
    )) AS credits_array
  FROM `TABLENAME`
  LEFT JOIN UNNEST(credits) AS credits
  WHERE (cost != 0 OR credits.amount != 0) AND invoice.month = "202404"
  GROUP BY
    billing_account_id,
    service_id,
    sku_id,
    usage_date
)

SELECT
  main.billing_account_id,
  main.service.id AS service_id,
  main.service.description AS service_des,
  main.sku.id AS sku_id,
  main.sku.description AS sku_des,
  FORMAT_DATETIME('%Y-%m-%d', main.usage_start_time) AS usage_date,
  main.location.location AS location,
  main.location.region AS region,
  main.location.country AS country,
  main.location.zone AS zone,
  main.invoice.month AS invoice_mon,
  main.currency,
  AVG(main.currency_conversion_rate) as currency_conversion_rate,
  main.cost_type,
  main.resource.name AS resource_name,
  main.resource.global_name AS res_global_name,
  SUM(main.usage.amount_in_pricing_units) AS usage_amount_pricing_units,
  main.usage.pricing_unit AS pricing_unit,
  SUM(main.cost) AS cost,
  SUM(main.cost_at_list) AS cost_at_list,
  main.transaction_type,
  main.seller_name,
  main.adjustment_info.id AS adjustment_info_id,
  main.adjustment_info.description AS adjustment_des,
  main.adjustment_info.type AS adjustment_type,
  main.adjustment_info.mode AS adjustment_mode,
  SUM(main.price.effective_price) AS effective_price,
  SUM(main.price.pricing_unit_quantity) AS pricing_unit_quantity,
  main.project.id AS project_id,
  main.project.number AS project_number,
  main.project.name AS project_name,
  IFNULL(SUM(credits.credit_amount), 0) AS credit_amount,
  credits.credit_type AS credit_type,
  credits.credit_id AS credit_id,
  credits.credit_name AS credit_name,
  credits.credit_full_name AS credit_full_name,
  TO_JSON_STRING(main.labels) AS labels,
  TO_JSON_STRING(main.system_labels) AS system_labels
FROM `TABLENAME` AS main
LEFT JOIN aggregated_credits AS credits
  ON main.billing_account_id = credits.billing_account_id 
  AND main.service.id = credits.service_id
  AND main.sku.id = credits.sku_id
  AND FORMAT_DATETIME('%Y-%m-%d', main.usage_start_time) = credits.usage_date
WHERE (main.cost != 0 OR IFNULL(SUM(credits.credit_amount), 0) != 0) 
  AND main.invoice.month = "202404"
GROUP BY
  main.billing_account_id,
  main.service_id,
  main.service_des,
  main.sku_id,
  main.sku_des,
  usage_date,
  location,
  region,
  country,
  zone,
  invoice_mon,
  currency,
  cost_type,
  resource_name,
  res_global_name,
  pricing_unit,
  transaction_type,
  seller_name,
  adjustment_info_id,
  adjustment_des,
  adjustment_type,
  adjustment_mode,
  project_id,
  project_number,
  project_name,
  credit_type,
  credit_id,
  credit_name,
  credit_full_name,
  labels,
  system_labels
HAVING (SUM(main.cost) != 0 OR IFNULL(SUM(credits.credit_amount), 0) != 0)
ORDER BY usage_date;

Explanation:

Aggregated Credits: The aggregated_credits CTE now includes an ARRAY_AGG to aggregate the credit details into an array of structs. This ensures we collect all relevant credit information in a single step.
Join and Aggregate: In the main query, join with the aggregated_credits CTE and then use SUM to aggregate the credit_amount. Additionally, access individual credit details using the array elements.
Group By: Ensure the GROUP BY clause includes all necessary columns to prevent duplicate rows.
HAVING Clause: Use the HAVING clause to filter out rows with zero costs or credits.

techGeek

Getting error for this query :

Name credit_amount not found inside credits at [69:45]

I have one more doubt about the WITH aggregated_credits part, in this group by is done at billing_account_id, service_id, sku_id, and usage_date level. whether the resource level is needed or credits are only applied at these levels.

techGeek

Are these AI-generated queries? FYI: i tried all of those before posting it in the community

Getting cost value mismatch while using UNNEST(credits) in BigQuery

Explanation: