Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Data Fusion Current Timestamp issue

Hi Google,

I'm creating a data fusion pipeline where my source is GCS bucket and I'm processing a CSV file from GCS to Bigquery. I'm have 10 fields in source file and Im creating 1 extra field called IngestionTimestamp in bigquery output schema. so whenever files are inserted to bigquery. this ingestion timestamp default collects current timestamp at the time of ingestion. but the pipeline is getting failed.

can anyone suggest me how i achieve this scenairo 

Refer Below images for refernce

df.pngdf1.png

 

Solved Solved
4 5 856
2 ACCEPTED SOLUTIONS

The core issue seems to be related to how Data Fusion is interpreting and passing the timestamp value you're generating. The pipeline might be failing due to:

  1. Type Mismatch: BigQuery expects timestamps in a specific format (ISO 8601 format: YYYY-MM-DDTHH:MM:SS[.SSSSSS]Z). If your IngestionTimestamp isn’t formatted correctly, it can cause errors during insertion.

  2. Timezone Conflicts: There might be a discrepancy between the timezone settings in the Data Fusion environment and the timezone of the generated timestamp.

Here's how you might address this issue:

  1. Review the Wrangler Recipe:

    • Double-check the Wrangler transformation in your pipeline. The recipe uses current-datetime to generate the timestamp. Ensure this aligns with the format BigQuery expects.
  2. Timezone Handling:

    • Ensure you’re setting the timezone correctly in the transformation. You can use the explicit ISO 8601 format required by BigQuery. Here’s an example transformation:
set-column IngestionTimestamp datetime_format(current_datetime(), 'yyyy-MM-dd\'T\'HH:mm:ss.SSSSSS\'Z\'')
    • Replace 'UTC-08:00' with your desired timezone, but ensure it follows the ISO 8601 format for compatibility.

If the issue persists, use Wrangler’s transformation capabilities to explicitly convert the generated timestamp into the correct format. Here’s an example:

 

set-column IngestionTimestamp datetime_format(current_datetime(), 'yyyy-MM-dd\'T\'HH:mm:ss.SSSSSS\'Z\'')
  • his ensures the timestamp is in the correct format ('yyyy-MM-dd\'T\'HH:mm:ss.SSSSSS\'Z\'), which is what BigQuery expects for TIMESTAMP fields.

Schema Alignment

  1. Check the Output Schema:
    • In the BigQuery sink, confirm that the IngestionTimestamp field’s data type is set to TIMESTAMP in the output schema.

Additional Tips

  1. Check Logs:

    • Examine the pipeline logs carefully for any error messages related to the timestamp. This can provide valuable clues about the specific issue.
  2. Test Incrementally:

    • Build your pipeline incrementally. Start by generating the timestamp and verifying its format before connecting to the BigQuery sink.
  3. Community Resources:

    • The Google Cloud Community forum can be a great resource. If you’re still stuck, consider posting your detailed error message and configuration there for more targeted help.

Wrangler Recipe

Here’s a revised Wrangler recipe example that incorporates explicit timestamp formatting:

 

set-column IngestionTimestamp datetime_format(current_datetime(), 'yyyy-MM-dd\'T\'HH:mm:ss.SSSSSS\'Z\'')

Note: Replace 'UTC-08:00' with your actual timezone if different, but ensure the format aligns with ISO 8601 standards.

 

View solution in original post

Hello,

Thank you for contacting the Google Cloud Community.

To ensure expert assistance and dedicated support for your issue, I kindly request you to file a support ticket by clicking here[1]. Our support team will prioritize your request and provide you with the assistance you need.

For individual support issues, it is best to utilize the support ticketing system. We appreciate your cooperation!

[1]: https://cloud.google.com/support/docs/manage-cases#creating_cases

View solution in original post

5 REPLIES 5

Hi Team,

can anyone previously tried this requirement?

The core issue seems to be related to how Data Fusion is interpreting and passing the timestamp value you're generating. The pipeline might be failing due to:

  1. Type Mismatch: BigQuery expects timestamps in a specific format (ISO 8601 format: YYYY-MM-DDTHH:MM:SS[.SSSSSS]Z). If your IngestionTimestamp isn’t formatted correctly, it can cause errors during insertion.

  2. Timezone Conflicts: There might be a discrepancy between the timezone settings in the Data Fusion environment and the timezone of the generated timestamp.

Here's how you might address this issue:

  1. Review the Wrangler Recipe:

    • Double-check the Wrangler transformation in your pipeline. The recipe uses current-datetime to generate the timestamp. Ensure this aligns with the format BigQuery expects.
  2. Timezone Handling:

    • Ensure you’re setting the timezone correctly in the transformation. You can use the explicit ISO 8601 format required by BigQuery. Here’s an example transformation:
set-column IngestionTimestamp datetime_format(current_datetime(), 'yyyy-MM-dd\'T\'HH:mm:ss.SSSSSS\'Z\'')
    • Replace 'UTC-08:00' with your desired timezone, but ensure it follows the ISO 8601 format for compatibility.

If the issue persists, use Wrangler’s transformation capabilities to explicitly convert the generated timestamp into the correct format. Here’s an example:

 

set-column IngestionTimestamp datetime_format(current_datetime(), 'yyyy-MM-dd\'T\'HH:mm:ss.SSSSSS\'Z\'')
  • his ensures the timestamp is in the correct format ('yyyy-MM-dd\'T\'HH:mm:ss.SSSSSS\'Z\'), which is what BigQuery expects for TIMESTAMP fields.

Schema Alignment

  1. Check the Output Schema:
    • In the BigQuery sink, confirm that the IngestionTimestamp field’s data type is set to TIMESTAMP in the output schema.

Additional Tips

  1. Check Logs:

    • Examine the pipeline logs carefully for any error messages related to the timestamp. This can provide valuable clues about the specific issue.
  2. Test Incrementally:

    • Build your pipeline incrementally. Start by generating the timestamp and verifying its format before connecting to the BigQuery sink.
  3. Community Resources:

    • The Google Cloud Community forum can be a great resource. If you’re still stuck, consider posting your detailed error message and configuration there for more targeted help.

Wrangler Recipe

Here’s a revised Wrangler recipe example that incorporates explicit timestamp formatting:

 

set-column IngestionTimestamp datetime_format(current_datetime(), 'yyyy-MM-dd\'T\'HH:mm:ss.SSSSSS\'Z\'')

Note: Replace 'UTC-08:00' with your actual timezone if different, but ensure the format aligns with ISO 8601 standards.

 

Hi MS4446,

I tried it but i unable to solve, can I have quick 10 mins call with you. so I can complete this task easily. if possible please share mail I can setup a short call.

If you are still running into issues you might to reachout to Google Cloud support for expert assistance.

Hello,

Thank you for contacting the Google Cloud Community.

To ensure expert assistance and dedicated support for your issue, I kindly request you to file a support ticket by clicking here[1]. Our support team will prioritize your request and provide you with the assistance you need.

For individual support issues, it is best to utilize the support ticketing system. We appreciate your cooperation!

[1]: https://cloud.google.com/support/docs/manage-cases#creating_cases