Solved: Re: Issues with Datastream (maybe related to pSQL ...

JustJeff

Okay, so I had this all working. PSQL -> Datastream -> BigQuery. It's all in Terraform, so presumably repeatable.

I deleted the database, and restored from a backup, and now Datastream refuses to work.

The connection profile test works fine
The backfill works fine
As soon as I start writing to the database, the stream says "An unexpected connection error occurred while trying to replicate data from the source. Make sure that the connection profile configuration is correct. If the problem persists, contact Google Support."

Popping open the logs for the Steam I see:

CDC fetch failed: An unexpected connection error occurred while trying to replicate data from the source. Make sure that the connection profile configuration is correct, If the problem persists, contact Google Support..
invalid memory alloc request size 1476819584

What I've tried:

This is running fine in other environments on the same data
I've deleted/restarted the stream/replication slot 15 times
I've checked the permissions (the datastream user has replication permissions, again the connection test passes, and it seems to be partially working)
I deleted the database and restored again (using the managed backup)
I deleted the database, created a new database, and restored the data from an export (gcloud sql import).

The only thing I can't try is downgrading back to 17.4 (which is what the other environments are running, and presumably this database was as well, but when I restore it still uses 17.5).

I've completely run out of ideas.

JustJeff

Had nothing to do with Datastream. Is a bug in Postgres 17.5: https://postgrespro.com/list/thread-id/2738915

View solution in original post

JustJeff

Had nothing to do with Datastream. Is a bug in Postgres 17.5: https://postgrespro.com/list/thread-id/2738915

brauliofg

Hi, I'm having the same issue with PG 15.13 + Datastream. We tried adding more memory (logical_decoding_work_mem) to 5 GB, but it still doesn't work, and the process stops with the following error:

"CDC Fetch Failed: An unexpected connection error occurred while attempting to replicate data from the source. Make sure your connection profile settings are correct. If the problem persists, contact Google Support."
"ERROR: Invalid memory allocation request, size 1674007328."

We have another environment with PG 15.12 + Datastream, and it works perfectly.

Are you still experiencing the error, or did you resolve it?

JustJeff

Check my other comment. It's a known bug in PG. Still very much having the issue with no real workaround.

Aside— It's INSANE that GCP doesn't allow you to launch an instance with anything but the current maintenance version.

brauliofg

I validated the release notes for PG 15.13 and 17.5

https://www.postgresql.org/docs/release/15.13/
https://www.postgresql.org/docs/release/17.5/

I found the following regarding logical decoding, which is what CDC uses for Datastream and is still being applied in the versions we're using. I took the time to send an email to the people who made the fix, but they haven't responded yet.

check "Prevent over-advancement of catalog xmin in “fast forward” mode of logical decoding (Zhijie Hou)"

JustJeff

Really unsure of what your point is here. It is a known issue in 17.5 and they are fixing it in 17.6.

alisha47

Hi JustJeff,

Thanks for the detailed breakdown — these kinds of Datastream issues after a PostgreSQL restore can be especially tricky, even when everything appears fine in Terraform.

The error:

invalid memory alloc request size 1476819584

…usually indicates an issue with WAL stream decoding — possibly caused by inconsistent replication slot metadata or subtle differences introduced during the restore.

You can consider trying these things:

Manually drop and recreate the replication slot.
Restores can leave behind invisible inconsistencies, even if the slot appears valid.
Double-check PostgreSQL config on the restored DB:

wal_level = logical
max_replication_slots, max_wal_senders, wal_keep_size
Restoring from managed backups or exports can sometimes reset these.

Check version consistency.
Use terraform taint on the Datastream connection profile and stream to ensure a clean rebuild.
Try a minimal insert into the source DB to see if any data change triggers the crash — this can help narrow the issue down to volume vs. structure.

Optionally, you can add resilience between the source and BigQuery. If you continue running into issues after restores or version mismatches, you might consider adding a buffer layer like Windsor. It connects to PostgreSQL and BigQuery, handles schema drift, intelligently retries, and can mitigate version differences and transient WAL inconsistencies. Especially useful if you're managing cross-environment pipelines where Datastream gets fragile.

Let me know what ends up working, and feel free to share more logs if you're still stuck. There’s a chance Google Support may need to look into the internal stream state if nothing helps.

JustJeff

Thanks for taking some time here. This is a bug in Postgres replication, and had nothing to do with Datastream. I marked the proper update as the resolution.

I'm still shocked the CloudSQL doesn't allow any form of rollback; you can't even a launch a new instance on the older minor version. So, apparently nothing to do but wait for 17.6.

Issues with Datastream (maybe related to pSQL 17.5)