I am experiencing an issue with my BigQuery data streaming application, which uses the following dependency:
Issue Details:
Action Performed: The application uses JsonStreamWriter and BigQueryWriteClient to stream data to BigQuery. On 25/07/2024, it tried streaming data to bigquery.
Error Encountered: On 29/07/2024, we received an exception related to this operation.
Delay Concern:
Request for Support:
Retry Mechanism:
Issue Diagnosis:
We appreciate your assistance in resolving this issue. Please let us know if any additional information is required.
Best regards,
Sanjay Pratap T K.
Solved! Go to Solution.
In version 2.24.2 of the google-cloud-bigquerystorage library, the JsonStreamWriter's internal retry mechanism within the ConnectionWorker's appendLoop() did not include an explicit timeout. As a result, if a transient error, such as an SSL handshake failure, occurred, the retries could potentially continue indefinitely, leading to persistent issues.
The upgrade to version 3.5.1 addresses this limitation by introducing a default timeout for in-flight requests, set to approximately five minutes. This improvement prevents retries from continuing indefinitely in the face of persistent errors, making the retry behavior more robust and predictable.
While both versions offer flexibility through customizable retry settings (RetrySettings), version 3.5.1 provides a significant advantage with its built-in timeout mechanism, safeguarding against indefinite retry loops in the face of persistent errors. This key enhancement, along with the standard exponential backoff with jitter approach employed in both versions, ensures greater reliability and predictability in BigQuery data streaming operations.
Given the enhancements in version 3.5.1, particularly the introduction of a timeout for in-flight requests, it is strongly recommended to continue using the newer version of the library for your BigQuery data streaming applications.
The BigQuery Storage client library includes built-in retry mechanisms for maintaining the reliability of data streaming operations by handling transient errors and network issues. The library employs exponential backoff with jitter, a strategy that increases the time between retries exponentially while adding a random factor to avoid synchronized retry bursts. This default retry behavior allows the library to attempt retries for a significant period, often extending to several hours, depending on the configured timeout settings.
You can customize this retry behavior through the RetrySettings object when constructing the BigQueryWriteClient. This customization allows adjustments to parameters such as the initial retry delay, maximum retry delay, total timeout duration, and the specific error codes that should trigger a retry. For instance, developers can configure settings to manage delays between retries, the overall operation timeout, and the types of errors that warrant a retry attempt. Below is an example of how to configure custom retry settings in Java:
RetrySettings retrySettings = RetrySettings.newBuilder()
.setInitialRetryDelay(Duration.ofSeconds(5))
.setMaxRetryDelay(Duration.ofMinutes(10))
.setTotalTimeout(Duration.ofHours(2))
.setRetryableCodes(Code.DEADLINE_EXCEEDED, Code.UNAVAILABLE)
.build();
BigQueryWriteSettings writeSettings = BigQueryWriteSettings.newBuilder()
.setRetrySettings(retrySettings)
.build();
BigQueryWriteClient writeClient = BigQueryWriteClient.create(writeSettings);
The occurrence of a five-day delay before an exception is reported is highly unusual and requires careful investigation. To diagnose this issue, it is important to start by examining the exception details, such as the message and stack trace, which can offer clues about the nature of the error. Keywords like "deadline exceeded" or "unavailable" might indicate prolonged network problems or transient issues.
Additionally, reviewing BQ logs and quotas is essential to identify any potential resource exhaustion or errors around the time the data stream was initiated. Ensuring that the network configurations and firewall rules are correctly set up to avoid connectivity issues between the application and BigQuery is also important.
We are currently utilizing the google-cloud-bigquerystorage dependency, version 2.24.2, for our BigQuery data streaming application. During our implementation, we encountered an issue where it is not possible to configure the retry settings. Specifically, we faced an UNAVAILABLE error due to an SSLHandshakeException.
Upon investigation, we noticed that the JSONStreamWriter's append method internally retries within the ConnectionWorker's appendLoop(). It appears that there is no default timeout for these retries in version 2.24.2.
In an effort to address this, we upgraded to version 3.5.1 of google-cloud-bigquerystorage. In this version, we observed that the retry mechanism includes a timeout, and the process throws an exception if the inflight request exceeds 5 minutes.
Could you please confirm whether the absence of a default timeout for retries in version 2.24.2 is a known issue?
And also please provide us the default retry settings.
In version 2.24.2 of the google-cloud-bigquerystorage library, the JsonStreamWriter's internal retry mechanism within the ConnectionWorker's appendLoop() did not include an explicit timeout. As a result, if a transient error, such as an SSL handshake failure, occurred, the retries could potentially continue indefinitely, leading to persistent issues.
The upgrade to version 3.5.1 addresses this limitation by introducing a default timeout for in-flight requests, set to approximately five minutes. This improvement prevents retries from continuing indefinitely in the face of persistent errors, making the retry behavior more robust and predictable.
While both versions offer flexibility through customizable retry settings (RetrySettings), version 3.5.1 provides a significant advantage with its built-in timeout mechanism, safeguarding against indefinite retry loops in the face of persistent errors. This key enhancement, along with the standard exponential backoff with jitter approach employed in both versions, ensures greater reliability and predictability in BigQuery data streaming operations.
Given the enhancements in version 3.5.1, particularly the introduction of a timeout for in-flight requests, it is strongly recommended to continue using the newer version of the library for your BigQuery data streaming applications.
Thanks @ms4446