Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Issue with Setting Up Logical Replication and Decoding on GCP PostgreSQL

 am writing to seek assistance regarding an issue I encountered while attempting to set up logical replication and decoding on GCP PostgreSQL, following the instructions provided in the documentation at https://cloud.google.com/sql/docs/postgres/replication/configure-logical-replication.

During the setup process, I encountered the following error message:

" db=db_b,user=[unknown] ERROR: subscriber test_sub initialization failed during non-recoverable step (d), please try the setup again"

I have followed the instructions carefully, but I am unable to proceed due to this error. I have reviewed the documentation and troubleshooting steps provided, but I have not been able to find a solution to this particular issue.

Could you please provide guidance on how to resolve this error and successfully set up logical replication and decoding? Any additional insights or recommendations would be greatly appreciated.

Solved Solved
0 2 1,339
1 ACCEPTED SOLUTION

The error "subscriber test_sub initialization failed during non-recoverable step (d)" typically indicates a problem during the initial synchronization phase of the replication setup. This step involves creating a consistent snapshot of the source database to be transferred to the subscriber. Here are some steps you can take to troubleshoot and resolve this error:

  1. Network Connectivity:

    • Firewall Rules: Ensure that firewall rules on both the source and subscriber instances allow communication between them. Check both Cloud SQL instance-level firewalls and any network-level firewalls (like VPC firewall rules) that might be in place.
    • Private IP: If your instances are in a VPC, verify that they have private IP addresses assigned and that they can communicate over the internal network.
  2. Replication User Permissions:

    • Privileges: Double-check that the replication user (the one you're using for the CREATE SUBSCRIPTION command) has sufficient privileges. It needs:
      • LOGIN privilege on both the source and subscriber instances.
      • REPLICATION role on the source instance.
      • Appropriate privileges on the subscriber instance, which can typically include the ability to create and manage subscriptions.
  3. pglogical Extension:

    • Proper Installation: Make sure the pglogical extension (if used) is installed on both the source and subscriber instances. You can verify this by running SELECT * FROM pg_extension WHERE extname = 'pglogical';.
    • Version Compatibility: Ensure that the pglogical version on both instances is compatible.
  4. Resource Constraints:

    • Subscriber Resources: If the subscriber instance has limited resources (CPU, memory, disk), it might struggle during the initial synchronization process. Monitor resource usage and consider upgrading the subscriber instance if needed.
  5. Conflicting Objects:

    • Sequences: If you have sequences on the source that are not replicated, they might cause conflicts. You can exclude them from replication by using the pglogical.replication_set_remove_table() function.
    • Other Objects: Large tables or complex objects might cause issues during synchronization. You can temporarily exclude them from replication until the initial setup is complete.

Troubleshooting Steps

  1. Examine Logs: Carefully review the PostgreSQL logs on both the source and subscriber instances. Look for any additional error messages that might provide clues.
  2. Restart Replication Setup: If you've ruled out the above issues, try dropping the existing subscription and restarting the replication setup process from scratch.

    Fixing Replication User Permissions

     
    -- On the source instance
    GRANT pg_read_all_data TO your_replication_user;
    
    -- On the subscriber instance
    GRANT pg_write_all_data TO your_replication_user;
    

If the problem persists, reach out to Google Cloud Support for assistance. They can access detailed logs and provide expert help.

View solution in original post

2 REPLIES 2

The error "subscriber test_sub initialization failed during non-recoverable step (d)" typically indicates a problem during the initial synchronization phase of the replication setup. This step involves creating a consistent snapshot of the source database to be transferred to the subscriber. Here are some steps you can take to troubleshoot and resolve this error:

  1. Network Connectivity:

    • Firewall Rules: Ensure that firewall rules on both the source and subscriber instances allow communication between them. Check both Cloud SQL instance-level firewalls and any network-level firewalls (like VPC firewall rules) that might be in place.
    • Private IP: If your instances are in a VPC, verify that they have private IP addresses assigned and that they can communicate over the internal network.
  2. Replication User Permissions:

    • Privileges: Double-check that the replication user (the one you're using for the CREATE SUBSCRIPTION command) has sufficient privileges. It needs:
      • LOGIN privilege on both the source and subscriber instances.
      • REPLICATION role on the source instance.
      • Appropriate privileges on the subscriber instance, which can typically include the ability to create and manage subscriptions.
  3. pglogical Extension:

    • Proper Installation: Make sure the pglogical extension (if used) is installed on both the source and subscriber instances. You can verify this by running SELECT * FROM pg_extension WHERE extname = 'pglogical';.
    • Version Compatibility: Ensure that the pglogical version on both instances is compatible.
  4. Resource Constraints:

    • Subscriber Resources: If the subscriber instance has limited resources (CPU, memory, disk), it might struggle during the initial synchronization process. Monitor resource usage and consider upgrading the subscriber instance if needed.
  5. Conflicting Objects:

    • Sequences: If you have sequences on the source that are not replicated, they might cause conflicts. You can exclude them from replication by using the pglogical.replication_set_remove_table() function.
    • Other Objects: Large tables or complex objects might cause issues during synchronization. You can temporarily exclude them from replication until the initial setup is complete.

Troubleshooting Steps

  1. Examine Logs: Carefully review the PostgreSQL logs on both the source and subscriber instances. Look for any additional error messages that might provide clues.
  2. Restart Replication Setup: If you've ruled out the above issues, try dropping the existing subscription and restarting the replication setup process from scratch.

    Fixing Replication User Permissions

     
    -- On the source instance
    GRANT pg_read_all_data TO your_replication_user;
    
    -- On the subscriber instance
    GRANT pg_write_all_data TO your_replication_user;
    

If the problem persists, reach out to Google Cloud Support for assistance. They can access detailed logs and provide expert help.

Thank you for your support. I successfully resolved the issue based on your suggestion.