Solved: Error: canCommit() is called for transaction

solutions-lrrou · 02-19-2024 10:44 AM

Hello,

I am extracting data from a postgre database through VPN for CDAP data fusion.

I have some pipelines that are working correctly, but only one has the following error below:

Spark 'phase 1' program failed.
Spark program 'phase-1' failed with error: canCommit() is called for transaction {transaction number} that is not in progress (is known to be invalid). Check system logs for more details.
Pipeline 'pipeline name' failed.
The workflow service 'workflow.{client project name}.{pipeline name}.DataPipelineWorkflow.253bbfb9-cf50-11ee-a96a-0000007dd528' failed.

Can someone help me?
Because the Data Fusion log does not show the necessary information about this error.

The information between { } is customer data, so I did not make it available

solutions-lrrou

Solution, some indexes were created in the database to optimize the query and it became a little faster.
After this procedure, the pipeline extracted the data normally.

It is necessary to search for “timeout configuration” in the Data Fusion tool or use a condition in the query, if the error appears again due to the amount of data.

View solution in original post

ms4446

The error message "canCommit() is called for transaction {transaction number} that is not in progress (is known to be invalid)" typically indicates a transaction management issue, where a transaction is either:

Prematurely Committed or Aborted: The transaction might have been marked as complete or rolled back before the canCommit() call was made.
Invalidated Due to Conflicts: The transaction could have encountered a conflict, such as concurrent modifications to the same data, which rendered it invalid.

Troubleshooting Steps

Inspect CDAP Logs

Begin by thoroughly examining the CDAP logs for the problematic pipeline. Search for detailed error messages around the "canCommit()" failure, focusing on timestamps that match the failure to identify any preceding warnings or errors that could reveal the root cause.

Check Network Connectivity and Configuration

VPN: Ensure the VPN connection between Google Cloud and your PostgreSQL database is stable, without packet loss, timeouts, or other connectivity issues.
Firewall Rules: Verify that your firewall settings permit traffic between Google Cloud and your PostgreSQL instance on the necessary ports.

Review PostgreSQL Log

Examine your PostgreSQL server logs for any errors, warnings, or conflict messages that coincide with the CDAP pipeline failure, paying close attention to the timestamps.

Database Concurrency

Transactions in Other Tools: Investigate if another tool, script, or program is interacting with the PostgreSQL database in a way that might cause conflicts.
Transactions Within the Pipeline: Review your pipeline's logic for potential issues with transaction handling, such as unintentional commits, rollbacks, or long-running transactions that could interfere with other stages.

CDAP Components and Resource Contention

Transaction Timeout: Check your CDAP configuration for any transaction timeouts that might be too short and adjust them if necessary.
Overloaded Components: Ensure that the CDAP instances handling your pipeline have adequate memory and CPU resources to manage transactions effectively.

Additional Considerations

CDAP Version: Ensure your CDAP version is up to date. If you're using an older version, check for any known bugs related to this issue and consider upgrading.
Dependencies: Review your Data Fusion pipeline for outdated or incompatible dependencies that might impact transaction management.

Requesting Additional Information

To provide more tailored advice, please provide the following information:

PostgreSQL Version: The specific version of PostgreSQL you're using.
CDAP Version: The version of CDAP Data Fusion in use.
Simplified Pipeline Structure: A brief description of the relevant pipeline stages (especially source, transforms, and sinks), highlighting any sections that manage transactions.

Additionally, consider whether any recent changes to the pipeline, CDAP Data Fusion platform, or database configuration might correlate with the onset of the issue. Understanding these aspects can significantly aid in diagnosing and resolving the problem.

solutions-lrrou

Solution, some indexes were created in the database to optimize the query and it became a little faster.
After this procedure, the pipeline extracted the data normally.

It is necessary to search for “timeout configuration” in the Data Fusion tool or use a condition in the query, if the error appears again due to the amount of data.