Dear Team,
I am encountering an issue while installing the Apigee environment operator. One of the pods, apigee-runtime-hale-mode-43890-evaluation-1042d85-1132-5btbmvj6, is stuck in a CrashLoopBackOff state. Below are the logs
{"level":"SEVERE","thread":"NIOThread@4","mdc":{},"className":"com.apigee.probe.ProbeServiceImpl","method":"lambda$runProbesForStatus$1","severity":"SEVERE","message":"probe \"READINESS:VersionLoadProbe\" execution failed due to Version not loaded by Message Processor","formattedDate":"2024-11-27T05:13:30.708Z","logger":"ProbeServiceImpl"}
{"level":"SEVERE","thread":"NIOThread@4","mdc":{},"className":"com.apigee.probe.ProbeAPI","method":"getResponse","severity":"SEVERE","message":"probe failed with details ProbeStatusResponse{isProbeSuccessful=false, failureMessages=[Probe VersionLoadProbe failed due to Version not loaded by Message Processor]}","formattedDate":"2024-11-27T05:13:30.709Z","logger":"ProbeAPI"}
I would appreciate guidance on resolving this issue. Thank you in advance for your support!
Hello @hemanth_ch ,
When synchronizer pods have connection issues to Cassandra, they will fail their health probe leading newer Message Processor pods not able to start due to lack of contracts, you can review the logs on Cloud Logging
resource.type="k8s_container"
resource.labels.container_name="apigee-synchronizer"
(jsonPayload.className="com.apigee.probe.ProbeAPI" OR jsonPayload.className="com.apigee.probe.ProbeServiceImpl")
You can check the status of the Cassandra nodes using nodetool status, is very important verify that Cassandra cluster was able to accept new connections.
I recommend create a support ticket to have more assistance.
Thank you for your kind assistance. Below are the logs
From Cassandra Container:
sh-5.1$ /opt/apigee/apigee-cassandra/bin/nodetool -u admin_user -pw iloveapis123 status
Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.130.1.1 4.14 MiB 256 100.0% 6005722a-97fb-4ab6-93bc-26e1989196b0 ra-1
From Synchronizer Pod
sh-5.0$ telnet 10.130.1.1 9042
Connection closed by foreign host
and json response from query
{
insertId: "usroawf26thoj"
jsonPayload: {
_p: "F"
className: "com.apigee.probe.ProbeServiceImpl"
formattedDate: "2024-11-27T03:23:51.764Z"
level: "SEVERE"
logger: "ProbeServiceImpl"
mdc: {0}
message: "probe "READINESS:VersionLoadProbe" execution failed due to Version not loaded by Message Processor"
method: "lambda$runProbesForStatus$1"
thread: "NIOThread@4"
time: "2024-11-27T03:23:51.764356625+00:00"
}
logName: "projects/hale-mode-438906-h7/logs/stderr"
receiveTimestamp: "2024-11-27T03:23:51.958712379Z"
resource: {2}
severity: "ERROR"
timestamp: "2024-11-27T03:23:51.764356625Z"
}
Kindly provide next steps to troubleshoot the issue
Thanks
Also, Please find below logs from synchronizer pod
{"level":"SEVERE","thread":"Apigee-Timer-9","mdc":{},"className":"com.apigee.hybrid.runtime.signals.trace.sync.pubsub.PubSubContext","method":"createSubscription","severity":"SEVERE","message":"Upstream subscription creation request failed due to error with status code : 404 reason : Not Found response : <!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/v1/organizations/hale-mode-438906-h7/environments/evaluation:subscribe</code> was not found on this server. <ins>That’s all we know.</ins>\n","formattedDate":"2024-11-29T08:54:22.697Z","logger":"CONFIG-CHANGE"}
{"level":"SEVERE","thread":"Apigee-Timer-9","mdc":{},"className":"com.apigee.hybrid.runtime.signals.context.SignalsContextImpl","method":"initializePubSubContext","severity":"SEVERE","message":"ControlPlaneCommunicationFailure exception occurred. Backing off signal polling","formattedDate":"2024-11-29T08:54:22.698Z","logger":"CONFIG-CHANGE","exceptionStackTrace":"com.apigee.hybrid.runtime.contract.sync.ControlPlaneCommunicationFailure{ code = runtime.contract.sync.SubscriptionRequestError, message = Signals subscription request with control plane failed, error msg: Not Found, associated contexts = []}\n"}
@hemanth_ch Which version is of Apigee Hybrid?.
Looks like that the synchronizer to control plane connection issues, primarily due to wrong configurations. most of the cases configuring wrong service account for the wrong control plane url.