Data Stream from AWS RDS MariaDB to Big Query

Hi everyone,

I am in the process of setting up Datastream to transfer data from AWS RDS using MariaDB engine to Big Query. I am using the documentation provided at https://cloud.google.com/datastream/docs/configure-your-source-mysql-database for guidance. However, I have come across three source-side configurations that I am not sure about:

net_read_timeout3600
net_write_timeout3600
wait_timeout86400

The values for these configurations are much higher than the default value for MariaDB, and the net_write_timeout is causing an "Incompatible Parameters" error for the RDS replica. Are they safe? Can anyone tell me if these configurations are necessary for Datastream to function correctly? I would really appreciate any advice or insight you may offer.

Thank you,

Solved Solved
0 5 1,102
1 ACCEPTED SOLUTION

The three source-side configurations you have mentioned are related to how the MariaDB (or MySQL) database system behaves, not specifically how Datastream uses them. Datastream recommends specific values for these configurations to ensure smooth communication with the source database, but the definitions themselves are from the perspective of the database.

  • net_read_timeout: This configuration specifies the maximum amount of time that Datastream will wait for a response from the source database before timing out the connection.
  • net_write_timeout: This configuration specifies the maximum amount of time that Datastream will wait for a write operation to complete on the source database before timing out the connection.
  • wait_timeout: This configuration specifies the maximum amount of time that Datastream will wait for a query to complete on the source database before timing out the connection.

The default values for these configurations in MariaDB are:

  • net_read_timeout: 30 seconds
  • net_write_timeout: 60 seconds
  • wait_timeout: 28800 seconds (8 hours)

Datastream recommends the following values for these configurations:

  • net_read_timeout: 3600 seconds (1 hour)
  • net_write_timeout: 3600 seconds (1 hour)
  • wait_timeout: 86400 seconds (24 hours)

Datastream recommends these higher values because it streams data from the source database to BigQuery in real time. This means that Datastream needs to be able to handle long-running queries and network outages without losing data.

The net_write_timeout configuration is causing an "Incompatible Parameters" error for your RDS replica because the default value for this configuration on RDS replicas is 60 seconds. To resolve this error, you can either increase the value of the net_write_timeout configuration on your RDS replica or decrease the value of the net_write_timeout configuration in Datastream. However, I recommend that you only decrease this value if you are absolutely sure that it is necessary.

Are the Datastream recommended values for these configurations safe?

Yes, the Datastream recommended values for these configurations are safe. Datastream has been tested with these values and has been found to be stable and reliable in typical use-cases and data loads.

Are these configurations necessary for Datastream to function correctly?

No, these configurations are not necessary for Datastream to function correctly. However, using the Datastream recommended values for these configurations can help to improve the performance and reliability of Datastream.

Recommendations

Based on Datastream's documentation, using the recommended values for the net_read_timeout, net_write_timeout, and wait_timeout configurations can help ensure that Datastream is able to stream data from your AWS RDS database to BigQuery in a reliable and efficient manner.

If you are unable to increase the value of the net_write_timeout configuration on your RDS replica, you can decrease the value of the net_write_timeout configuration in Datastream. However, I recommend that you only decrease this value if you are absolutely sure that it is necessary.

Additionally, the longer timeouts help Datastream to handle long-running queries and delays in capturing changes. This ensures that Datastream does not prematurely disconnect or lose data.

View solution in original post

5 REPLIES 5