Solved: ERROR : INVALID_ARGUMENT jobInternalError on Kafka... - Page 2

SebH · 08-10-2023 06:03 AM

Hi

We use Kafka BigQuerySinkConnector to update partitioned tables in a dataset,
From time to time, we have an error on Google Bigquery side and the connector fail

{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "The job encountered an internal error during execution and was unable to complete successfully.",
    "reason" : "jobInternalError"
  } ],
  "message" : "The job encountered an internal error during execution and was unable to complete successfully.",
  "status" : "INVALID_ARGUMENT"
}

On GCP side we have no information about this error

After the connector restart it work again, but do you have any hint why this error happen ?

Thanks

ms4446

The jobInternalError is a broad error message, and it can be triggered by various underlying issues. If restarting the connector resolves the problem, it suggests that the issue might be transient or related to the connector's state or environment at that particular time. Here are some potential reasons for this error:

Resource Constraints: The connector might have run into resource constraints, either in terms of CPU, memory, or network. Restarting the connector would free up and reallocate these resources.
Connector State: Connectors maintain an internal state, and sometimes this state can become corrupted or inconsistent, leading to errors. Restarting the connector resets its internal state.
Throttling or Quota Exceedance: If the connector is making too many requests in a short period, it might hit BigQuery's rate limits or quotas. After a pause (like a restart), the quotas might reset or the request rate might drop, allowing the connector to function again.
Temporary Network Issues: There could have been a brief network disruption between your Kafka cluster and BigQuery, causing the connector to fail. The network might have stabilized by the time you restarted the connector.
Data Issues: Sometimes, a particular batch of data might cause issues (e.g., schema mismatch, corrupted data). If the connector processes data in batches and moves to the next batch after a restart, it might bypass the problematic data.
Concurrent Modifications: If there are other processes or jobs modifying the BigQuery table or dataset at the same time the connector is trying to write data, it might lead to conflicts or errors.

To get a clearer picture of why the error occurred, you can:

Check Logs: Examine the logs of the Kafka BigQuerySinkConnector for any warnings, errors, or unusual messages leading up to the failure.
Monitor Resources: Use monitoring tools to check the resource usage of the connector around the time of the error.
GCP Operations Suite: Look for any related logs or metrics in GCP's monitoring and logging tools to see if there were any issues on the GCP side.

View solution in original post

SebH

We had a reply from Google Cloud Support :
The BigQuery Engineering Team shared that your failed job was affected by a spike in traffic which caused system overload in a short period of time.
With this, they have suggested you rerun the job and advise if the same error still appears at your end.

So the only way to keep running the connector instead of failing when this error happen is to modify the kafka connector code to retry job when this error happen

View solution in original post

ERROR : INVALID_ARGUMENT jobInternalError on Kafka BigQuerySinkConnector