Issue: While performing the drill to failover to a DR data center, we are observing below issues:
Background and overview:
We have two Apigee OPDK data centers - a primary DC and a DR DC, in active-passive mode. At a given point in time, the API traffic will flow in any one of the DC. If the primary DC fails, the traffic is to be routed to the DR.
Node 1 | Apigee Message Processor and Router |
Node 2 | Apigee Message Processor and Router |
Node 3 | Apigee Message Processor and Router |
Node 4 | Apigee Message Processor and Router |
Node 5 | Apigee Message Processor and Router |
Node 6 | Apigee Message Processor and Router |
Node 7 | Cassandra and Zookeeper |
Node 8 | Cassandra and Zookeeper |
Node 9 | Cassandra and Zookeeper |
Node 10 | Edge UI, management server, OpenLDAP |
Node 11 | Edge UI, management server, OpenLDAP |
Node 12 | Postgres DB |
Node 13 | Qpid Server |
Node 14 | Qpid Server |
We performed a planned failover and failback activity to verify that the DC-2 Apigee instance works as expected. However, we faced some issues mentioned below during the activity.
A quick summary of failover steps that we performed:
Could anyone help with why the why the analytics data is being routed to the old PG master node even after the PG DB failover as suggested in the documents?
Dear @skhendkar,
After reviewing your situation, we believe that submitting a support request would be the most effective way to ensure you receive the precise guidance. Here's more information to open a support case:
If you have a Google Cloud Support Plan file a support ticket through Google Cloud Console.
If you do not have a support plan, you should contact your existing sales point of contact or use the Contact Us form to talk to someone.
Thank you for your reply @AlexET .
We are actively working with the Google team to resolve an issue with the support portal and we plan to get the right support from them.
Meanwhile, it would be great if you could share your inputs / document / guide that talks about steps for switching between the data centers. As I could see that there is no specific page on Apigee documentation that talks about the same, we came up with the steps ourselves and tried them out, but to our surprise, the same steps yield different results.
While in one iteration, the failover to a DR data center was successful, the failback using similar steps fails and we observed issues mentioned in the ticket description. In the next iteration, the exact same steps did not work during failover.
We mainly focused on updating below configurations -