Wanted to post an issue and solution we recently encountered. We are on-prem customers running 4.15.07.03. We faced an issue last week where we were unable to deploy any proxy to our planet. During deployment, the management portal would take a long time to respond and eventually show an error saying that the proxy was deployed but traffic might not flow, etc. After hitting the management api, we could see our proxy was deployed to only one of our two message processors.
/v1/organizations/<org>/environments/<env>/apis/<proxy>/revisions/1/deployments
"server": [ { "status": "deployed", "type": [ "message-processor" ], "uUID": "5969f7e8-a32b-45e3-87f5-8b982bc2bf24" }, { "error": "Call timed out; either server is down or server is not reachable", "status": "error", "type": [ "message-processor" ], "uUID": "2cfbf65f-4ed0-4f17-99f4-811bce25b39e" },
We also saw the following error message in the management server logs...
2016-01-01 00:05:52,222 org:nminternal env:<env> qtp451189693-42887 ERROR DISTRIBUTION - RemoteServicesUnDeploymentHandler.unDeployFromServers() : RemoteServicesUnDeploymentHandler.unDeployFromServers : UnDeployment exception for server with uuid 2cfbf65f-4ed0-4f17-99f4-811bce25b39e : cause = RPC Error 504: Call timed out communication error = true com.apigee.rpc.RPCException: Call timed out at com.apigee.rpc.impl.AbstractCallerImpl.handleTimeout(AbstractCallerImpl.java:64) ~[rpc-1.0.0.jar:na] at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall.handleTimeout(RPCMachineImpl.java:483) ~[rpc-1.0.0.jar:na] at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall.access$000(RPCMachineImpl.java:402) ~[rpc-1.0.0.jar:na] at com.apigee.rpc.impl.RPCMachineImpl$OutgoingCall$1.run(RPCMachineImpl.java:437) ~[rpc-1.0.0.jar:na] at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:532) ~[netty-all-4.0.0.CR1.jar:na] at io.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:430) ~[netty-all-4.0.0.CR1.jar:na] at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:371) ~[netty-all-4.0.0.CR1.jar:na] at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_91]
We resolved this issue by recycling both the management server and the impacted message processor.