Apigee Hybrid Multi Region Deployment With Internal Pod IP's

We have deployed Apigee Hybrid in east and west region and have enabled connectivity between the two regions. We are using the kubenet networking model. In Kubenet, the nodes get an IP address from the Azure virtual network subnet. Pods receive an IP address from a logically different address space to the Azure virtual network subnet of the nodes. Network address translation (NAT) is then configured so that the pods can reach resources on the Azure virtual network. The source IP address of the traffic is NAT'd to the node's primary IP address.

 

For example when I run the nodetool status command from inside the cluster I get the below response as expected

nodetool -h 10.250.139.149 -u jmxuser -pw iloveapis123 status

Datacenter: dc-1

================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack

UN  172.20.8.20  840.49 KiB  256          54.4%             ce4091d0-9fcf-4a6a-8bb4-af561d829f5a  ra-1

UN  172.20.4.14  1.06 MiB   256          45.6%             01a78a9a-e9a3-4540-830f-90068b3e7147  ra-1

 

Where 172.20.8.20(apigee-cassandra-default-0) and 172.20.4.14(apigee-cassandra-default-1) are the Cassandra pods internal IP’s and 10.250.139.149 is the Node’s IP on which apigee-cassandra-default-0 pod is running.

 

But if I run the nodetool command from outside the cluster it times out even though the node from which the command is run has connectivity to 10.250.139.149. I believe it’s because the node doesn’t have access to the pod’s internal IP’s. Is that the case ?

I believe I need to set the broadcast_address to the public ip(10.250.139.149). Is there a config setting in the Apigee Hybrid overrides file to set the broadcast_address?

Also, consider a scenario where I have two Hybrid clusters pointing to the same GCP project without their Cassandra clusters synced. How will it affect the Applications and API Product data shown on the Apigee UI ?

Solved Solved
0 8 1,324
1 ACCEPTED SOLUTION

@varunrajbhatdid you discover what the issue was.  we are facing the same situation. 

View solution in original post

8 REPLIES 8

@varunrajbhat ,

  • It is recommended to enable hostNetwork for apigee-hybrid multi region deployment when the AKS cluster(s) created with kubenet CNI. Please check apigee-hybrid multi region for AKS docs.
  • If the AKS cluster(s) created with azure CNI, you can use azure VPC IPs for pod network and can have vNet peering between region/clusters to form a multi region cassandra cluster.
  • There is no option to override the broadcast_address in apigee-hybrid.
  • If you have two cassandra clusters out of sync or not formed a multi region ring, the UI will load-balance between two clusters and will give inconsistent results. 

 

 

Thanks,

Ram

Setting hostNetwork to true causes the Cassandra pod to go in

CrashLoopBackOff state. On running kubectl describe command on the pod, I see the below error message:

Exec lifecycle hook ([/bin/sh -c nodetool -u $CPS_ADMIN_USER -pw $CPS_ADMIN_PASSWORD disablebinary ; nodetool -u $CPS_ADMIN_USER -pw $CPS_ADMIN_PASSWORD disablethrift ; nodetool -u $CPS_ADMIN_USER -pw $CPS_ADMIN_PASSWORD drain ; PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done]) for Container "apigee-cassandra" in Pod "apigee-cassandra-default-0_apigee(7cb51d7b-ae4c-4268-95d6-e6c3b72cae09)" failed - error: rpc error: code = Unknown desc = failed to exec in container: failed to create exec "31d05aa8a90b20fb7b8dae5a505295d62f96d5c215a0b851ee3169da7079a10c": cannot exec in a deleted state: unknown, message: ""

Did you try to enable hostNetwork on the existing apigee-hybrid install or on fresh install? based on the  provided output it seems like the pod is in deleted state. 

I enabled hostNetwork on the existing apigee-hybrid. I have also tried it on a fresh install. I run into the same issue where Cassandra pod goes in CrashLoopBackOff state

Can you share logs from the cassandra pod(s)?

@varunrajbhatdid you discover what the issue was.  we are facing the same situation. 

Answering 2 queries separately based on experience. -- 

"Also, consider a scenario where I have two Hybrid clusters pointing to the same GCP project without their Cassandra clusters synced. How will it affect the Applications and API Product data shown on the Apigee UI ?"

Ideally you should not be doing this as both are pointing to same cluster.   The synch between the runtime (cassandra) and mgmt plane (UI) is handled through Apigee connect service.  In a worst case scenario where you have to deal with 2 clusters not in synch but pointing to same APIGEE org - ensure that only one cluster has active Apigee connect pods - in other cluster scale these to 0. This will ensure your UI and atleast one cluster always remains in synch and will not impact developers who create API Keys etc on the UI.  Since synch is not happening you will need to manual synch between the tables if at all you need the second cluster  to handle the same API traffic - i have faced this kind of issue in NON prod for some test purposes where I had to juggle between the 2 clusters while not impacting regular developers doing their API work - but never do this in PROD enviornment - cassandras being in synch is the best option.  

 

Regarding nodetool status

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack

UN  172.20.8.20  840.49 KiB  256          54.4%             ce4091d0-9fcf-4a6a-8bb4-af561d829f5a  ra-1

UN  172.20.4.14  1.06 MiB   256          45.6%             01a78a9a-e9a3-4540-830f-90068b3e7147  ra-1

 

Not related to the query you asked but had an different query - The 2 cassandra instances shown - are these the full cluster ? - Are they both in same region - Ideally different region can have separate cassandra datacenters so that it is replicated across regions. (and 100% data should be in both regions)