Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Random 502 responses from ApigeeX

We're experiencing random 502 responses from Apigee that weren't there before we started proxying the traffic.

Those 502's are not caught up by Apigee debug and are mostly appearing during traffic spikes.

Apigee is never reaching backend server from what I can see and timeouts are set up correctly.

the path is external LB -> Apigee -> internal LB (managed by GKE Gateway API) -> k8s pod.

There are no signs of those 502 in the logs of internal LB that I’ve managed to pull out.

Any suggestions on how to debug this issue or anyone has faced something similar ?

 

Example log (from apigee LB as there are no logs for that on the backend LB):

Logging is enabled for the backend LB with sample rate 1, but it's just not there
{
"insertId": "1wtog76fv7wi0l",
"jsonPayload": {
"statusDetails": "response_sent_by_backend",
"backendTargetProjectNumber": "projects/<masked>",
"cacheDecision": [
"RESPONSE_HAS_CONTENT_TYPE",
"REQUEST_HAS_AUTHORIZATION",
"CACHE_MODE_USE_ORIGIN_HEADERS"
],
"@type": "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry",
"remoteIp": "<masked>"
},
"httpRequest": {
"requestMethod": "POST",
"requestSize": "688",
"status": 502,
"responseSize": "257",
"userAgent": "Go-http-client/2.0",
"remoteIp": "<masked>",
"serverIp": "masked",
"latency": "0.011870s"
},
"resource": {
"type": "http_load_balancer",
"labels": {
"zone": "global",
"forwarding_rule_name": "apigee-xlb-forwarding-rule",
"backend_service_name": "apigee-xlb-backend",
"target_proxy_name": "apigee-xlb-target-proxy",
"project_id": "maskedxxxxxx-prod",
"url_map_name": "apigee-xlb"
}
},
"timestamp": "2024-02-13T13:40:16.317277Z",
"severity": "WARNING",
"logName": "projects/maskedxxxxxx/logs/requests",
"trace": "projects/maskedxxxxxx/traces/db56cc35a2423d7423320b861500762a",
"receiveTimestamp": "2024-02-13T13:40:16.697784853Z",
"spanId": "ec9f533d40095e3e"
}
3 6 304
6 REPLIES 6

Hello @sijohnmathew, were you able to resolve this issue?

hi @sijohnmathew  @Char-K 

I am facing same issue. Below is the traffic flow. I am getting 502 Bad gateway errors when k8 pods scale down when request comes thru Apigee

Client App -->Apigee -->internal Load balancer -->k8s pods. [502 errors when pods scale down]

When we directly hit the internal load balancer this issue doesn't appear.

Client App -->internal Load balancer -->k8s pods. [no issues]

Not sure which setting to tweak between apigee and load balancer. I tried to change keep alive settings in apigee proxy but no positive results. 

Please advise?

 

@sijohnmathew @satyendrasrivas - how are you hitting the ILB? Is it via peering or PSC? some more networking details would help

Also - what if you run a load directly pointing to the ILB, do you see any 5xx on the load testing client?

@ssvaidyanathan Thank you for your reply.

So we have seen this issue in production environment. We reproduced it in test environment.

ILB is F5 load balancer which is in front of k8s service.

We have used 3 different testing clients 1) NeoLoad 2) JMeter 3) Postman and with all three we are able to reproduce the issue. Whenever we directly hit F5 LB using these testing tools, he 502 Bad Gateway issue doesn't apear. When Our testing tool hit Apigee Gateway and Apigee hits targetserver (F5 LB url) we see this issue occurs.

Regarding "peering or PSC" I am asking network engineer to confirm.

>>what if you run a load directly pointing to the ILB, do you see any 5xx on the load testing client?

No 502 errors when directly hitting the ILB.

 

@ssvaidyanathan 

>>how are you hitting the ILB? Is it via peering or PSC? 

It’s a gcp interconnect from our datacenter to gcp

Thanks for the info. Since this is happening only when you include Apigee. I would request you to open a Support ticket. The Support Engg can then enable TCP dump, check the network packets and see why/where its getting dropped.