Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Datastream - Unknown error when creating stream between self-managed PostgreSQL and BigQuery

Hello,

I am unable to create a stream between a self-managed PostgreSQL instance and BigQuery.

All connectivity tests are passing and the user Datastream is configured to use is successfully authorised when validating the stream configuration.

However the stream validation fails with an unknown error containing no information on how to tackle the issue: 

done: true
error:
  code: 2
  details:
  - '@type': type.googleapis.com/google.rpc.ErrorInfo
    domain: datastream.googleapis.com
    metadata:
      message: An unknown error occurred. Please try again. If the error persists,
        contact Google support.
      originalMessage: ''
      time: '2023-08-21T09:40:23Z'
      uuid: c078d80b-4006-11ee-ad60-aea4f578a289
    reason: UNKNOWN
  message: Unknown Error.
metadata:
  '@type': type.googleapis.com/google.cloud.datastream.v1.OperationMetadata
  apiVersion: v1
  createTime: '2023-08-21T09:40:22.420969500Z'
  endTime: '2023-08-21T09:40:25.097965017Z'
  requestedCancellation: false
  target: projects/martin-test-datalab/locations/europe-west4/streams/pg2bq
  verb: create
name: projects/martin-test-datalab/locations/europe-west4/operations/operation-1692610822176-6036baacc8060-d7a665c8-116b7e7b

 From Logs Explorer: 

nasdenkov_0-1692618365988.png

From the UI: 

nasdenkov_1-1692618968662.png

 

Our Postgres instance contains no logs pertaining to what exactly failed. 
Both the replication slot and the publication exist, the replication slot has its `active` column set to `false`, indicating that Datastream was maybe unable to create the subscription? 

Where can I get more information on what exactly failed? Has anybody encountered this before and, if so, how did you handle it?

Thanks

Solved Solved
0 1 1,352
1 ACCEPTED SOLUTION

Here are some things you can try to troubleshoot the unknown error you are getting when creating a stream between a self-managed PostgreSQL instance and BigQuery:

  1. Check the connectivity tests. Make sure that the connectivity tests are still passing between Datastream, PostgreSQL, and BigQuery. If they are not, review network configurations and firewall rules. As a last resort, consider restarting the Datastream agent and the PostgreSQL instance.
  2. Verify permissions. Ensure that the user Datastream is configured to use has the necessary permissions on both the PostgreSQL and BigQuery sides. In the Google Cloud Console, verify that the service account used by Datastream has roles like roles/datastream.streamsAdmin and roles/datastream.connectionsAdmin. Also, check the permissions for the replication user in PostgreSQL.
  3. Check the logs. Review the logs in the Google Cloud Console's Logging section for Datastream and the PostgreSQL instance's log directory for any related errors.
  4. Use the API or command line. Consider creating the stream using the Datastream API or the command line for potentially more detailed error messages.
  5. Contact Google Cloud Support. If the issue persists, it might be beneficial to contact Google Cloud support for assistance.

Here are some additional considerations:

  • Authentication method: Ensure the PostgreSQL instance uses the correct authentication method. Datastream supports MD5 password encryption and SCRAM-SHA-256 password encryption. Review the pg_hba.conf file in your PostgreSQL instance to confirm the authentication method.
  • Replication user: Verify that the replication user in PostgreSQL has the necessary privileges, including REPLICATION and LOGIN. Refer to the PostgreSQL documentation for more details on these privileges.

View solution in original post

1 REPLY 1

Here are some things you can try to troubleshoot the unknown error you are getting when creating a stream between a self-managed PostgreSQL instance and BigQuery:

  1. Check the connectivity tests. Make sure that the connectivity tests are still passing between Datastream, PostgreSQL, and BigQuery. If they are not, review network configurations and firewall rules. As a last resort, consider restarting the Datastream agent and the PostgreSQL instance.
  2. Verify permissions. Ensure that the user Datastream is configured to use has the necessary permissions on both the PostgreSQL and BigQuery sides. In the Google Cloud Console, verify that the service account used by Datastream has roles like roles/datastream.streamsAdmin and roles/datastream.connectionsAdmin. Also, check the permissions for the replication user in PostgreSQL.
  3. Check the logs. Review the logs in the Google Cloud Console's Logging section for Datastream and the PostgreSQL instance's log directory for any related errors.
  4. Use the API or command line. Consider creating the stream using the Datastream API or the command line for potentially more detailed error messages.
  5. Contact Google Cloud Support. If the issue persists, it might be beneficial to contact Google Cloud support for assistance.

Here are some additional considerations:

  • Authentication method: Ensure the PostgreSQL instance uses the correct authentication method. Datastream supports MD5 password encryption and SCRAM-SHA-256 password encryption. Review the pg_hba.conf file in your PostgreSQL instance to confirm the authentication method.
  • Replication user: Verify that the replication user in PostgreSQL has the necessary privileges, including REPLICATION and LOGIN. Refer to the PostgreSQL documentation for more details on these privileges.