Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Error while configuring data fusion wrangler properties

Hi Community,

I am trying to create a pipeline in Data Fusion where my source is a CSV file stored in a GCS bucket, and the destination is a BigQuery table. However, I am encountering an issue when deploying the pipeline. It keeps failing, and when I try to validate the Wrangler properties, it shows an error indicating that the UDD field is empty. Please refer to the screenshot below for the exact error message.

Issue Details:

  • Source: CSV file in GCS bucket
  • Destination: BigQuery table
  • Error Message: UDD field is empty (as shown in the screenshot)

Steps Taken:

  1. Created a pipeline in Data Fusion with the appropriate source and destination.
  2. Configured the Wrangler properties for data transformation.
  3. Attempted to validate the Wrangler properties.
  4. Encountered the error regarding the empty UDD field.

Screenshot: Attach the screenshot showing the error message

I would appreciate any guidance on how to resolve this issue. Has anyone faced a similar problem, or does anyone have insights into what might be causing this validation error?

Thank you,
MuraliScreenshot 2024-05-17 124542.png

Solved Solved
5 4 885
1 ACCEPTED SOLUTION

just delete the first line in wrangler directive with #pragma

View solution in original post

4 REPLIES 4

UDD (User Defined Directives) allows you to customize how Wrangler interprets and transforms your data. It's a critical part of configuring the pipeline for your specific CSV structure and desired transformations.

Possible Causes & Solutions:

1. Missing or Incorrect UDD Configuration:

The Fix: Double-check your Wrangler configuration to ensure that you have properly defined UDD directives. This involves specifying the data types, column names, and any necessary transformations for each field in your CSV file.

Key Points:

  • Use the Wrangler interface to define the schema (field names and data types) accurately.
  • If you have complex transformations, leverage Wrangler's expressions to manipulate the data as needed.

2. Invalid CSV Structure:

The Fix: Examine your CSV file to verify that it's well-formed. Ensure that it has a header row, the delimiter (e.g., comma) is consistent, and there are no malformed or corrupt rows.

Key Points:

  • Wrangler expects a specific CSV format. Any inconsistencies can lead to parsing issues.
  • If needed, use a text editor or CSV tool to clean up your data before importing.

3. Permissions Issues:

The Fix: Confirm that your Data Fusion service account has the necessary permissions to read from the GCS bucket and write to the BigQuery table.

Key Points:

  • The service account needs at least the following roles:
    • Storage Object Viewer (for GCS access)
    • BigQuery Data Editor (for table creation/modification)

Debugging Tips:

Validate Wrangler Properties: Always validate your Wrangler properties before deploying the pipeline. This can often pinpoint the specific configuration issue.

Check Pipeline Logs: Examine the pipeline logs for any error messages related to the UDD field. This can provide more context on the problem.

Consult Documentation: Refer to the Cloud Data Fusion documentation for detailed guidance on configuring Wrangler and UDD directives.


Additional Notes:

If you're still encountering issues, provide the following details for more specific assistance:

  • A sample of your CSV data (if possible)
  • Your Wrangler configuration
  • The full error message from the pipeline logs

Preview Feature: Consider using Cloud Data Fusion's "Preview" feature to see how the data is interpreted and transformed based on your Wrangler configuration. This can help with troubleshooting.

just delete the first line in wrangler directive with #pragma

And also, I'm working on a pipeline to load a CSV file from a GCS bucket into BigQuery using Cloud Data Fusion. I'm encountering an issue when I try to convert a string column to a simple date format. The problem is that any empty cells in this column are causing the entire rows to be skipped when loading into the BigQuery table.

I need help ensuring that the data loads into BigQuery without skipping the rows that have empty cells in the date column. Any advice or solutions would be greatly appreciated.

You can use functionality of fill null and empty by whitespace and then try to convert to simple date format