We plan to have a two data-centre deployment of the Apigee-system with in each centre one MS. My question is how would you ensure HA for the Management-server? Is that needed? I guess so as it is needed to register RMPs. What would be the value of MS_IP in the 'response-file' for the other nodes in both data-centres? The example in the installation-guide is not 100% clear as for MS_IP in the first 'response-file' is IP of primary MS, and in the second 'response-file' it is the secondary MS but the text is not highlighted like the other differences.
Another question: would you try to always connect to the local MS or to the primary MS and only to the secondary MS as failover.
Good question. Till now I assumed an 'active-passive' approach but this was also meant to be part of the question. Is there a preference for 'active-active', 'active-passive' or does it not matter? Or asked differently: what are the pros and cons of the different approaches.
To answer your question more directly: I do not think we have requirements that need an 'active-active' approach
You will hate this answer, but it depends. This is an overall consideration for the product implementation. Beginning with the business objective….
Dependency is determined by the business mission of exposing the business API’s. Meaning, if the consumption of your API Proxies is mission critical, expects high availability at all times, then an active-active approach may be your best option.
If the consumption of your API Proxies can tolerate “outages” as seen for performing maintenance and/or interruption of service, then active-passive may suffice.
Directly related to the Management Server, the area where development of the proxy begins, take a step back to consider if the development or debugging capabilities is also “critical”. If so, then the Management Server should also consider being active-active…else, active-passive on development.
Have you been prescribed to have a n-host clustered solution by Apigee? Where are you in your implementation of the Apigee platform?
Breaking my own rules for suggesting a solution with minimal information, and from an initial standpoint of having an active passive consideration, here are some assumptions:
Cassandra / Zookeeper will replicate across datacenter using a ring configuration. OpenLDAP will also replicate in order to retain access for the development community.
This, again, is high level and conceives the concept of what you can do to make the 2 datacenters communicate and replicate data so in the event of “failover”, the secondary datacenter has the data required for continued business development.
Assuming that you are starting the process of building the overall environment, consider you can always break this down into its simplest form, bring up one datacenter; then scale / expand into another datacenter.
The most common approach for 2 DCs setup where MS HA is required is to have one MS per DC. In that scenario, the response file used to install components on each DC will have MS_IP equal to the IP of the local MS corresponding to the DC.
response_dc1.txt will contain MS_IP=<local IP to DC1>
response_dc2.txt will contain MS_IP=<local IP to DC2>
Traffic handling to MS will be defined by your preferences. A global load balancer can be used to direct traffic to both MS (UI+MS). You need to decide what makes sense in your case, most customers with multiple MS use one as primary and the other as active standby. Sending traffic to one MS all the time unless failure, facilitate management and troubleshooting since you know which UI, MS and OL are used on every user interaction (API Developers, Org Admins, etc) and Management call (MS API calls).
During runtime components do not connect to MS. Neither they do for configuration, configuration is coming from ZK and in some cases CS.
During upgrade, addition/removal of capacity and other management activities, you will need to connect to MS, this can be any available MS on either DC1 or DC2.
Please let me know if the above addresses your questions.
Thank you for your suggestion. It is very helpful. Can i check a couple of questions?
1. If i am deploying the Management Server as Active-Active cross datacenter, what would my management API endpoint?
2. If i am deploying the Management Server and Edge UI as Active-Standby cross datacenter, proxies that deployed from the Edge UI or the changes done through the Management API, will it automatically replicate to the other datacenter?
Your advise is much appreciated!