Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Group by aggregate in Cloud data fusion failing

Hi all,

I am trying to build a ETL pipeline in Cloud Fusion.

When i do an sum aggregate on  a field from a bigquery view, i get the following error:

 

rvaneikeren_0-1718794465324.png

Do you guys have any idea?

Regards,

 

Rolf

1 1 126
1 REPLY 1

The error “Cannot compute Sum on field BilledCost because its type DECIMAL is not numeric” in Data Fusion indicates a type mismatch issue when performing an aggregation. In BigQuery, DECIMAL is a numeric type, but Data Fusion might be interpreting it differently, causing the issue.

To resolve this, follow these steps:

Step-by-Step Solution

  1. Check Field Type in BigQuery: Ensure that the field BilledCost in your BigQuery view is indeed of type DECIMAL (or NUMERIC).
  2. Use Data Wrangling to Convert Type: Add a Wrangler transform before the aggregate step to explicitly cast the DECIMAL type to a numeric type that Data Fusion can work with.
    • Use a transformation formula like: CAST(BilledCost AS DOUBLE) or CAST(BilledCost AS FLOAT), depending on your precision needs.
  3. Update Schema in Data Fusion: Update the schema in Data Fusion to reflect the type change. Go to the Schema tab and set the field BilledCost to double or float type.
  4. Perform Aggregation: Now, you should be able to perform the sum aggregation on the field BilledCost.

Detailed Steps in Data Fusion

  1. Add Wrangler Transform: In your pipeline, add a Wrangler transform before your aggregation step.
  2. Open the Wrangler interface: Write the transformation formula to cast BilledCost to a numeric type that Data Fusion can handle, for example, float.
  3. Apply Transformation: Apply and validate the transformation.
  4. Update Schema: After the Wrangler transform, update the schema for the downstream stages to ensure they recognize BilledCost as a numeric type (double or float).
  5. Aggregate: Proceed with the aggregate stage where you can now compute the sum on BilledCost.

Example Pipeline

  • Source (BigQuery Table): Input: BigQuery table/view with BilledCost.
  • Wrangler (Data Transformation): Transformation: CAST(BilledCost AS DOUBLE)
  • Schema Update: Ensure BilledCost is now recognized as a double.
  • Aggregator: Aggregation: Sum on BilledCost.
  • Sink: Output the results to the desired destination.

Example Wrangler Transformation Code 

 
SELECT CAST(BilledCost AS DOUBLE) AS BilledCost
FROM my_table

Alternative Solutions

In some cases, you might explore:

  • Custom Transformations: Writing custom code (e.g., in JavaScript) within Data Fusion to handle the conversion logic.
  • Aggregation Function for Decimals: In rare cases when direct type conversion is not feasible, certain aggregation functions in BigQuery (like SAFE_CAST along with ARRAY_AGG) can be used to work around the issue.

Additional Considerations

  • Precision and Scale: If your DECIMAL type has a specific precision and scale that needs to be maintained, ensure the conversion does not lose critical data. Choose between FLOAT and DOUBLE (or another numeric type) based on your precision needs.
  • Schema Changes: If your schema changes frequently, consider automating the type casting process or updating your ETL pipeline dynamically to handle these changes without manual intervention.

Testing After implementing the type conversion, test your pipeline with sample data to ensure that the aggregation now works as expected and the results are accurate.