Hi DataStream Team,
We are trying to use DataStream to replicate data from PostgreSQL RDS at AWS to GCP BigQuery. I've setup the Datastream with PostgreSQL database as the source profile. All regional IPs that Stream creation wizard has showed, were added to inbound rules on bastion SSH Tunnel server. Stream, source and destination connections were created in the same region - northamerica-northeast2 (Toronto). I validated that:
1) Connectivity tests pass.
2) Validation tests at the end of Stream creation wizard also pass - 100% green.
When I try to start the Stream I get an error "An unknown error occurred. Please try again. If the error persists, contact Google support."
Questions:
Thank you.
Depending on the error, information is provided in the Streams or Stream details pages of the Datastream UI. You can also use Datastream's APIs to retrieve information about the error. Here's the documentation for that.
Hi @Joevanie ,
Thank you for the reply. I do not see any reference to the error on the web page. All it says is: "An unknown error occurred. Please try again. If the error persists, contact Google support.". I tried again. Same issue. I contacted Google support ...
The document you provided helps to diagnose connection issues but does not provide any information on how to troubleshoot the failure to start the datastream or how to troubleshoot the backfill failure.
Hence, since I am likely not the first (and not the last 🙄) person who will have to deal with the same issue, I will post below the sequence of steps that I added to our internal company documentation on how to deal with this issue when setting up Datastream and initiating a backfill operation.
If you add multiple data tables for replication the DataStream backfill process may fail. You can see it by examining the list of DataStreams currently running. If this happens you may need to add objects to replication manually one-by-one. To do that follow below steps:
Note: “Select objects to exclude” and "Select objects to include" sections of the Datastream configuration interface do not seem to be "intuitive/smart" or "connected" to each other. In other words, the fact that you selected a table in the "include" section does not mean that it will be automatically removed from the "exclude" section. It took me good 4 hours to figure that out so I hope it helps you avoid similar time waste 😉
Initiate failed backfill manually process.
To resolve the issue with a failed backfill you can try to initiate a manual backfill operation for all tables with failed backfill status by following below steps:
LNK if you want to connect so that I can explain in greater details and share screenshots (unable to do it in this chat because I am not allowed to upload images).
Hope it helps future Datastream adopters 😉