Replication continues after job promotion and dele...

alexhwoods · 02-22-2024 01:35 PM

I've been using DMS to go from a CloudSQL PostgreSQL instance (v9.6) to a CloudSQL PostgreSQL instance (v15).

It works fine, except after the replication lag gets to 0, and I promote the job, the replication slots on the source instance remain. If I stop the job before I promote, they are no longer active. If I don't stop it, they are active, and replication continues to occur.

If I delete the destination database, that also sets the replication slots to inactive.

Even if I delete the migration job, the replication slots remain! They can be in either an inactive or active status depending on the above steps taken.

Now, in the situation where you want to rollback and go back to the source, this is a huge problem! Inactive replication slots can cause disk to fill up and for the server to deny writes.

I'm sure I must be doing something wrong, as this behavior doesn't make sense, but I have no idea what I could be doing. This is very much an internal to the migration job.

Thanks!

ms4446

You've identified a critical issue where replication slots remain active or persist even after the migration job is promoted or deleted. This inconsistency in the migration process, especially if not managed correctly, can indeed lead to the problems you've described. Your observation that stopping the job before promotion deactivates the replication slots, whereas not stopping it keeps them active, is particularly insightful and points towards a nuanced behavior of the DMS or PostgreSQL itself.

Since this involves internal behavior of the DMS and potential nuances specific to Google Cloud's implementation, reaching out to Google Cloud Support can provide more direct assistance and potentially identify any misconfigurations or bugs.

Replication continues after job promotion and deletion