Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to convert Time data type in Data Fusion?

Hi everyone,

I'm trying to load a CSV file from a GCS bucket to BigQuery while performing some transformations using Cloud Data Fusion. I have a column named "BILL_TIME" that contains time details in HH:mm format. I want to load this column into a BigQuery table with the TIME data type. Unfortunately, I haven't found a way to convert the string data type of the BILL_TIME column to the TIME data type.

Here are the details:

  • Current Data: The BILL_TIME column in my raw data is in the format "HH:mm" (e.g., 14:30).
  • Expected Data Type in BigQuery: TIME

I would appreciate any guidance on how to achieve this conversion. Below are screenshots of my raw data and the expected data type in BigQuery for reference.

Thanks in advance!

Best regards,
Murali

wrangler..jpegBigquery.jpegBigquery2.jpeg

1 3 470
3 REPLIES 3

Hi @murali2 ,

To convert the BILL_TIME column from a string in "HH:mm" format to the TIME data type in Google Cloud Data Fusion, follow these steps:

Step 1: Source

  1. Add GCS Plugin: Begin by adding a Google Cloud Storage (GCS) plugin to your pipeline to read your CSV file containing the BILL_TIME data.

Step 2: Wrangler Transformation

  1. Insert Wrangler Plugin: Insert a Wrangler plugin immediately after the GCS plugin.
  2. Configure Wrangler Transformation:
    • In the Wrangler transformation, add the following directive to convert the BILL_TIME column:
      parse-as BILL_TIME 'TIME' 'HH:mm'
    • This directive instructs Wrangler to:
      • parse-as: Convert the data type of the specified column.
      • BILL_TIME: The name of the column containing the time string.
      • TIME: The target data type to convert to (in this case, the TIME data type).
      • HH:mm: The format of the time string in the original data.

Step 3: BigQuery Sink

  1. Add BigQuery Sink Plugin: Add a BigQuery sink plugin after the Wrangler plugin.
  2. Ensure Schema Consistency:
    • Ensure that the schema of your target BigQuery table already includes a BILL_TIME column defined as the TIME data type.
    • Data Fusion will automatically map the converted time values to this column when writing the data.

Key Points:

  • Plugin Order: The Wrangler plugin should be positioned after the GCS plugin (where the CSV data is read) and before the BigQuery sink plugin (where the data is written).
  • Automatic Schema Handling: You do not need to explicitly update the schema within the Wrangler plugin. The parse-as directive automatically handles the data type conversion, and Data Fusion will correctly interpret the converted values when writing to BigQuery.
  • Predefined BigQuery Schema: The BigQuery table schema must be defined beforehand with a BILL_TIME column of type TIME.

it gives me the following error

Screenshot 2024-05-25 184647.png

You might need to try an alternative approach. Try using an "Expression" plugin to manually transform the BILL_TIME column. Here's how you can achieve this:

Converting BILL_TIME to TIME Data Type in Google Cloud Data Fusion

Step 1: Source

  1. Add GCS Plugin: Begin by adding a Google Cloud Storage (GCS) plugin to your pipeline to read your CSV file containing the BILL_TIME data.

Step 2: Expression Transformation

  1. Add Expression Plugin: Insert an "Expression" plugin immediately after the GCS plugin.

  2. Configure Expression Transformation:

    • In the Expression plugin, create a new field that converts the BILL_TIME string to the TIME data type.

    • Use the following expression to achieve this:

      TIME(SUBSTRING(BILL_TIME, 0, 2), SUBSTRING(BILL_TIME, 3, 2), 0)
    • This expression extracts the hour and minute components from the BILL_TIME string and constructs a TIME value.

Step 3: BigQuery Sink

  1. Add BigQuery Sink Plugin: Add a BigQuery sink plugin after the Expression plugin.

  2. Ensure Schema Consistency:

    • Ensure that the schema of your target BigQuery table already includes a BILL_TIME column defined as the TIME data type.

    • Data Fusion will automatically map the converted time values to this column when writing the data.

Detailed Steps:

  1. Source Configuration:

    • Add a GCS plugin to read the CSV file.

  2. Expression Transformation:

    • Insert the Expression plugin.

    • Add a new field in the Expression plugin configuration, for example, BILL_TIME_PARSED, with the following expression:

      TIME(SUBSTRING(BILL_TIME, 0, 2), SUBSTRING(BILL_TIME, 3, 2), 0)
    • This will create a new field BILL_TIME_PARSED with the TIME data type.

  3. BigQuery Sink Configuration:

    • Add a BigQuery sink plugin.

    • Map the BILL_TIME_PARSED field to the BILL_TIME column in the BigQuery table.

    • Ensure the BILL_TIME column in BigQuery is set to the TIME data type.

Key Points:

  • Plugin Order: Ensure the Expression plugin is placed after the GCS plugin and before the BigQuery sink plugin.

  • Schema Mapping: The new field created in the Expression plugin (BILL_TIME_PARSED) should be mapped to the BILL_TIME column in BigQuery.