I followed the on screen instruction to connect datastream from AWS RDS Postgresql to Bigquery.
Enabled logical replication, added
It doesn't show exactly what error. Does anyone experienced this issue? Need help to solve this.
If you are experiencing problems with logical replication between AWS RDS PostgreSQL and Google Cloud Datastream, you can try the following troubleshooting steps:
SELECT * FROM pg_replication_slots;
The output of this command should show the status of all the logical replication slots on the database. The slot that you created for Datastream should be in the active
state.
The user that Datastream is using to connect to the database must have the REPLICATION
role. You can check this by running the following command:
GRANT REPLICATION TO [MY_USER];
The user that Datastream is using to connect to the database must have the SELECT
privilege on all of the tables and USAGE
permission on the schema that you are replicating. You can check this by running the following commands:
GRANT SELECT ON ALL TABLES IN SCHEMA [MY_SCHEMA] TO [MY_USER];
GRANT USAGE ON SCHEMA [MY_SCHEMA] TO [MY_USER];
The logs may contain more information about the error that you are seeing. You can view the logs by going to the Cloud Console and navigating to Datastream > Connections. Click on the connection that you are using to replicate data from AWS RDS PostgreSQL to BigQuery, and then click on the Logs tab.
Additional tips:
Thanks ms446 for the quick reply. Here are findings so far
1. Output of logical replication slot status - Active
I am using bq_slot
2. Already grant replication to the user
3. Did both grant already.
4. Logs tab is not available in Datastream.
Please help on further debug this issue.
Thank you for sharing your findings. Based on what you've provided:
Logical Replication Slot Status: The slot is active, which is a good sign. This means that the replication slot is ready to stream changes.
User Permissions: You've granted the necessary replication permissions to the user, and you've also granted the required permissions on the tables and schema. This is also in order.
Logs in Datastream: You can navigate to the Logging section in the Google Cloud Console and filter by the Datastream service to see if there are any relevant logs there.
Given the information you've provided, here are some additional steps to consider:
WAL Retention: Ensure that the Write-Ahead Logging (WAL) retention settings on your AWS RDS PostgreSQL instance are set to retain logs for a sufficient duration. If WAL files are being cleaned up too quickly, it might affect replication.
Datastream Configuration: Double-check the configuration of your Datastream connection. Ensure that the connection details, such as the hostname, port, user credentials, and database name, are correct. Also, ensure that the tables you want to replicate are correctly specified in the Datastream configuration.
Network Connectivity: Ensure that there are no network issues between AWS RDS PostgreSQL and Google Cloud Datastream. Check security groups, VPC settings, and any firewall rules that might be blocking the connection.
Datastream Health: Navigate to the Datastream dashboard in the Google Cloud Console. Check the health and status of your streams. If there are any issues, they might be highlighted there.
Lastly, if you can provide any additional error messages, descriptions, or details from your investigations, it might help in narrowing down the issue further.
Let me share the steps I did. I followed the instructions provided on screen while creating datastream
1. While creating the source I choose postgres to bigquery and followed the settings mentioned here.
2. I just enabled rds.logical_replication to 1 and restarted the DB. Rest are default configurations
3. Below is the wal parameter values
4. Publication and slot were created
5. In the datasteam user setup, I used an existing user.
So I did only the step 2.
6. I used ssh tunneling to connect to the RDS which is inside VPC. Whitelisted the ips in the security group.
7. I selected only 2 tables from the schema while doing the stream operation.
After all the changes made, ran the test and it passed. For the first sync, it pulled all necessary data from postgres to bigquery. But backfill works.
Problem is, I did change the values in postgres table, but the change is not getting pulled in the stream. Logs
Below is the full log information.
{
"insertId": "ca2ba7c6-d49e-4d6e-b511-72f7b2316c4a",
"jsonPayload": {
"object_name": "200_dimensions",
"message": "CDC failed parsing with error type: UNKNOWN_ERROR, will retry...",
"event_code": "PROCESSOR_PARSE_FAILED",
"context": "CDC",
"read_method": ""
},
"resource": {
"type": "datastream.googleapis.com/Stream",
"labels": {
"location": "us-east4",
"stream_id": "aws-prod-db-stream",
"resource_container": "projects/1061065129435"
}
},
"timestamp": "2023-09-08T06:00:47.288560Z",
"severity": "ERROR",
"logName": "projects/composed-strata-301915/logs/datastream.googleapis.com%2Fstream_activity",
"receiveTimestamp": "2023-09-08T06:00:47.408295609Z"
}
Pasting the error json as well.
Logs doesn't show the error message correctly. It just says unknown error, retry...
Bit hard to understand the correct problem with the log it has. Is there any other place, i can see the detailed error message?
If you are experiencing problems with Datastream, consider the following troubleshooting steps:
Schema Verification:
Backfill:
Stream Operations:
Debug Logging:
Additional Tips:
Thanks for the great support. I will check again.
1 last question. Does JSONB supports for cdc stream?
Yes, JSONB is supported in DataStream. DataStream can replicate JSONB data from a PostgreSQL source database to a BigQuery destination table. However, there are a few limitations and considerations to be aware of:
If you are using DataStream to replicate JSONB data, you should also be aware of the following:
Sounds good. Let me try with other tables as well.