Hi everyone,
I'm trying to load a CSV file from a GCS bucket to BigQuery while performing some transformations using Cloud Data Fusion. I have a column named "BILL_TIME" that contains time details in HH:mm format. I want to load this column into a BigQuery table with the TIME data type. Unfortunately, I haven't found a way to convert the string data type of the BILL_TIME column to the TIME data type.
Here are the details:
I would appreciate any guidance on how to achieve this conversion. Below are screenshots of my raw data and the expected data type in BigQuery for reference.
Thanks in advance!
Best regards,
Murali
Hi @murali2 ,
To convert the BILL_TIME
column from a string in "HH:mm" format to the TIME data type in Google Cloud Data Fusion, follow these steps:
BILL_TIME
data.BILL_TIME
column:
parse-as BILL_TIME 'TIME' 'HH:mm'
parse-as
: Convert the data type of the specified column.BILL_TIME
: The name of the column containing the time string.TIME
: The target data type to convert to (in this case, the TIME data type).HH:mm
: The format of the time string in the original data.BILL_TIME
column defined as the TIME data type.parse-as
directive automatically handles the data type conversion, and Data Fusion will correctly interpret the converted values when writing to BigQuery.BILL_TIME
column of type TIME.it gives me the following error
You might need to try an alternative approach. Try using an "Expression" plugin to manually transform the BILL_TIME
column. Here's how you can achieve this:
Converting BILL_TIME
to TIME Data Type in Google Cloud Data Fusion
Step 1: Source
Add GCS Plugin: Begin by adding a Google Cloud Storage (GCS) plugin to your pipeline to read your CSV file containing the BILL_TIME
data.
Step 2: Expression Transformation
Add Expression Plugin: Insert an "Expression" plugin immediately after the GCS plugin.
Configure Expression Transformation:
In the Expression plugin, create a new field that converts the BILL_TIME
string to the TIME data type.
Use the following expression to achieve this:
TIME(SUBSTRING(BILL_TIME, 0, 2), SUBSTRING(BILL_TIME, 3, 2), 0)
This expression extracts the hour and minute components from the BILL_TIME
string and constructs a TIME value.
Step 3: BigQuery Sink
Add BigQuery Sink Plugin: Add a BigQuery sink plugin after the Expression plugin.
Ensure Schema Consistency:
Ensure that the schema of your target BigQuery table already includes a BILL_TIME
column defined as the TIME data type.
Data Fusion will automatically map the converted time values to this column when writing the data.
Detailed Steps:
Source Configuration:
Add a GCS plugin to read the CSV file.
Expression Transformation:
Insert the Expression plugin.
Add a new field in the Expression plugin configuration, for example, BILL_TIME_PARSED
, with the following expression:
TIME(SUBSTRING(BILL_TIME, 0, 2), SUBSTRING(BILL_TIME, 3, 2), 0)
This will create a new field BILL_TIME_PARSED
with the TIME data type.
BigQuery Sink Configuration:
Add a BigQuery sink plugin.
Map the BILL_TIME_PARSED
field to the BILL_TIME
column in the BigQuery table.
Ensure the BILL_TIME
column in BigQuery is set to the TIME data type.
Key Points:
Plugin Order: Ensure the Expression plugin is placed after the GCS plugin and before the BigQuery sink plugin.
Schema Mapping: The new field created in the Expression plugin (BILL_TIME_PARSED
) should be mapped to the BILL_TIME
column in BigQuery.