Hi there,
We have Apigee Edge 4.52.00.003 deployed on our premises.
Current setup -
We have two 14 node DCs running in active-passive mode with below Apigee components on each node.
Node 1 Apigee Message Processor and Router
Node 2 Apigee Message Processor and Router
Node 3 Apigee Message Processor and Router
Node 4 Apigee Message Processor and Router
Node 5 Apigee Message Processor and Router
Node 6 Apigee Message Processor and Router
Node 7 Cassandra and Zookeeper
Node 8 Cassandra and Zookeeper
Node 9 Cassandra and Zookeeper
Node 10 Edge UI, management server, OpenLDAP
Node 11 Edge UI, management server, OpenLDAP
Node 12 Postgres DB
Node 13 Qpid Server
Node 14 Qpid Server
Postgres replication is enabled with PG node in DC-1 as master and the one in DC-2 as slave.
Monetization is installed on both DC-1 and DC-2.
We performed a planned failover and failback activity to verify that the DC-2 Apigee instance works as expected. However, we faced some issues mentioned below during the activity.
DB error - can not write to the read-only DB
Analysis:
- The Apigee components are trying to push messages to DC PG node, which is not master anymore. Since messages can not be written to a slave node, hence the error.
- We followed the steps mentioned by Apigee to promote slave node from DR to DC and viceversa. The results are successful and status check shows that the correct nodes have been promoted and demoted. However, the message processor still writs to PG from DC (a slave node)
Monetization messages are stuck in qpid queue:
Analysis:
- During the drill, the monetization service was not active on DR.
- It was installed during the installation phase, however, we did not see it running when we faced the issue.
- Monetization messages are stuck in qpid queue
Could you please help with the possible causes of these issues and steps to prevent these?
A quick summary of failover steps:
Ensure DC-2 components are up and running, ensure Postgres and Cassandra are in sync with DC-2, and other prerequisites
Stop traffic on DC-1 Apigee instance
Take incremental backup of DC-1 Postgres (master)
Promote DC-2 PG as master and DC-1 PG as slave
Update LBs to point to DC-2 Apigee instance
Start traffic on DC-2 Apigee instance
Monitor Apigee traffic
References: