Hi GCP Community,
I'm currently working with Data Fusion and have set up a pipeline to transfer data from SQL server to BigQuery using Wrangler. I'm looking to add two columns, created_timestamp and updated_timestamp with the current timestamp in each row. I attempted to use JEXL in Wrangler with the following transformation: "set-column: created_timestamp now()".
However, this doesn't seem to work. Could anyone provide guidance or examples on how to achieve this?
Thanks in advance for you help!
Best regards,
Naveen.
Wrangler's primary purpose is data cleaning and transformation. While it offers scripting capabilities with JEXL, it's not designed to generate dynamic timestamps within the data flow. The now()
function isn’t natively recognized in Wrangler's JEXL context.
Solution 1: Pre-Processing with SQL Server
The most efficient approach is to handle timestamp generation at the source (SQL Server) before the data enters Data Fusion:
Modify Your SQL Query: Adjust your SQL query to include created_timestamp
and updated_timestamp
columns. Use SQL Server's built-in functions to populate them with the current timestamp. For example:
SELECT *, GETDATE() AS created_timestamp, GETDATE() AS updated_timestamp
FROM your_table;
Data Fusion Pipeline: The SQL Server source will now include these columns automatically. You won’t need any Wrangler transformations for this task.
Solution 2: Wrangler + JavaScript UDF (Less Ideal)
If generating timestamps within Data Fusion is necessary, use Wrangler’s JavaScript User-Defined Function (UDF) capability:
Wrangler Transformation:
Add a Wrangler transform to your pipeline.
In the Wrangler UI, click the "Functions" tab.
Create a new JavaScript UDF with the following code:
function addTimestamp() {
return new Date().toISOString();
}
Apply the UDF:
Switch to the "Transform" tab.
Add two directives:
derive created_timestamp using addTimestamp()
derive updated_timestamp using addTimestamp()
Important Considerations:
Example Pipeline (Solution 1)
SQL Server Source -> BigQuery Sink
SQL Server Query: Include the timestamp generation logic mentioned earlier.
Example Pipeline (Solution 2)
SQL Server Source -> Wrangler (JavaScript UDF) -> BigQuery Sink
Hi @ms4446 ,
Thank you for the suggestion.
I tried Solution 1 : Pre-Processing with SQL Server using the following SQL query:
SELECT *, GETDATE() AS created_timestamp, GETDATE() AS updated_timestamp
FROM dataset.schema.My_table;
This works for transferring the table data from SQL Server to BigQuery, but the audit columns(`created_timestamp` and `updated_timestamp`) are not being added. Could you please check the above image for your reference.
Any further assistance on how to properly and these timestamps columns would be greatly appreciated.
Thanks again!
Best regards,
Naveen.
Hi @Naveen_kumar ,
This issue might be related to how Data Fusion handles schema updates and transformations. Here are some potential solutions:
Schema Mismatch in Data Fusion
Issue: The schema defined in Data Fusion might not automatically reflect the new columns from the SQL query.
Solution: Refresh the schema in Data Fusion.
Modify your SQL query as shown:
SELECT *, GETDATE() AS created_timestamp, GETDATE() AS updated_timestamp
FROM "dataset"."schema"."My_table";
Click the GET SCHEMA button in the Data Fusion UI to update the schema.
Verify Output Schema
Issue: The output schema may need explicit updates to reflect changes.
Solution: Manually add the new columns to the output schema in Data Fusion.
created_timestamp
and updated_timestamp
manually as fields of type TIMESTAMP.SQL Query Adjustment
Issue: The SQL query might need adjustments depending on the SQL dialect or Data Fusion's handling.
Solution: Ensure correct SQL syntax and proper referencing of your dataset and table.
Use fully qualified names and check SQL Server configuration:
SELECT t.*, GETDATE() AS created_timestamp, GETDATE() AS updated_timestamp
FROM "dataset"."schema"."My_table" t;
Data Fusion Version or Configuration
Issue: Your version of Data Fusion might not support dynamic schema updates from SQL queries fully.
Solution: Check for updates or known issues with your Data Fusion version.
Pipeline Configuration
Issue: The Data Fusion pipeline might not propagate schema changes properly through the stages.
Solution: Review the pipeline stages to ensure consistent schema changes application.