Solved: Struggling to figure out why the ingress health ch...

doyouevensunbro · 08-10-2023 08:11 AM

So, I have checked with port forwarding the pod and the service and the health check is returning 200 OK. But the backend service for my ingress is reporting UNHEALTHY. Things to note:

I have a private cluster
The main service is an GRPC API using HTTP/2
The health check is an HTTP call on another port
I'm using a BackendConfig to set up the health check

Here are my configs:

Deployment

Name:                   api
Namespace:              agones-dev
CreationTimestamp:      Wed, 09 Aug 2023 15:04:47 -0700
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               app=api
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=api
  Containers:
   api:
    Image:        us-west2-docker.pkg.dev/agones-test-394122/api-testing/gateway:latest
    Ports:        8443/TCP, 9000/TCP
    Host Ports:   0/TCP, 0/TCP
    Liveness:     http-get http://:9000/actuator/health delay=15s timeout=2s period=10s #success=1 #failure=3
    Readiness:    http-get http://:9000/actuator/health delay=15s timeout=2s period=10s #success=2 #failure=3
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  api-598ccb9f5f (0/0 replicas created)
NewReplicaSet:   api-7bbdf7db56 (2/2 replicas created)
Events:          <none>

Service

Name:                     api
Namespace:                agones-dev
Labels:                   <none>
Annotations:              cloud.google.com/app-protocols: {"grpc":"HTTP2"}
                          cloud.google.com/backend-config: {"default": "api-backend-config"}
Selector:                 app=api
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.1.13.1
IPs:                      10.1.13.1
Port:                     grpc  443/TCP
TargetPort:               8443/TCP
NodePort:                 grpc  30474/TCP
Endpoints:                10.1.0.14:8443,10.1.1.11:8443
Port:                     health  9000/TCP
TargetPort:               9000/TCP
NodePort:                 health  30037/TCP
Endpoints:                10.1.0.14:9000,10.1.1.11:9000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Backend Config

Name:         api-backend-config
Namespace:    agones-dev
Labels:       <none>
Annotations:  <none>
API Version:  cloud.google.com/v1
Kind:         BackendConfig
Metadata:
  Creation Timestamp:  2023-08-10T04:02:56Z
  Generation:          2
  Resource Version:    296973
  UID:                 b28786a9-e41c-4303-af4e-1260eacbbdc6
Spec:
  Health Check:
    Check Interval Sec:   5
    Healthy Threshold:    1
    Port:                 9000
    Request Path:         /actuator/health
    Timeout Sec:          5
    Type:                 HTTP
    Unhealthy Threshold:  5
  Logging:
    Enable:  true
Events:      <none>

Ingress

Name:             api-ingress
Labels:           <none>
Namespace:        agones-dev
Address:          *.*.*.44
Ingress Class:    <none>
Default backend:  <default>
Rules:
  Host        Path  Backends
  ----        ----  --------
  *
              /*   api:443 (10.1.0.14:8443,10.1.1.11:8443)
Annotations:  ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-14599b3e-bdcc-4bae-90fe-d1e09031b62e
              ingress.kubernetes.io/backends: {"k8s-be-30474--5176d82132b3926f":"UNHEALTHY","k8s-be-32628--5176d82132b3926f":"HEALTHY"}
              ingress.kubernetes.io/forwarding-rule: k8s2-fr-qum271i5-agones-dev-api-ingress-6kqm9ce1
              ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-qum271i5-agones-dev-api-ingress-6kqm9ce1
              ingress.kubernetes.io/https-target-proxy: k8s2-ts-qum271i5-agones-dev-api-ingress-6kqm9ce1
              ingress.kubernetes.io/ssl-cert: mcrt-14599b3e-bdcc-4bae-90fe-d1e09031b62e
              ingress.kubernetes.io/target-proxy: k8s2-tp-qum271i5-agones-dev-api-ingress-6kqm9ce1
              ingress.kubernetes.io/url-map: k8s2-um-qum271i5-agones-dev-api-ingress-6kqm9ce1
              kubernetes.io/ingress.global-static-ip-name: api-failedmechanicsgames-com-ip
              networking.gke.io/managed-certificates: subdomain-ssl-certificate
Events:
  Type    Reason     Age                    From                     Message
  ----    ------     ----                   ----                     -------
  Normal  Sync       7m13s                  loadbalancer-controller  UrlMap "k8s2-um-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  Sync       7m9s                   loadbalancer-controller  TargetProxy "k8s2-tp-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  Sync       6m55s                  loadbalancer-controller  ForwardingRule "k8s2-fr-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  IPChanged  6m55s                  loadbalancer-controller  IP is now *.*.*.44
  Normal  Sync       6m46s                  loadbalancer-controller  TargetProxy "k8s2-ts-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  Sync       6m33s                  loadbalancer-controller  ForwardingRule "k8s2-fs-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  Sync       6m30s (x7 over 9m23s)  loadbalancer-controller  Scheduled for sync

The weird thing is that this worked yesterday and now UNHEALTHY. I can't find logs to help diagnose anywhere. Thanks for any help in advance.

oliveiralisson

I've faced a very similar issue this week.

Here goes what I learned to solve my issue.

1. For the Health Check of GRPC applications (at the deployment level on Kubernetes) there is a recommended approach https://cloud.google.com/blog/topics/developers-practitioners/health-checking-your-grpc-servers-gke

2. The Service must have the annotation

cloud.google.com/app-protocols: '{"grpc-api-port":"HTTP2"}'

3. The Ingress will automagically create a second Health Check for the L7 LoadBalancer (you can check yours at https://console.cloud.google.com/compute/healthChecks). You will see that the default is to try to make an HTTP2 request to the path "/" at the port defined on the Service. So your application must be able to respond to.

4. GRPC enforces the use of TLS. And the TLS is also used to determine whether the connection uses HTTP/1.1 or HTTP/2. So to make my application (AspNet Core 7) work correctly, responding to http/2 GRPC and to the http/1.1 GET method, I had to install a self-signed certificate into the container. (Regarding this, you will find a better explanation on https://learn.microsoft.com/en-us/aspnet/core/grpc/aspnetcore?view=aspnetcore-7.0&tabs=visual-studio check "Protocol Negotiation").

Hope this helps you.

View solution in original post

doyouevensunbro

If I remove the following annotation:

cloud.google.com/app-protocols: {"grpc":"HTTP2"}

The health check works but the app doesn't get HTTP/2 requests. It feels like the health check is ignoring the HTTP setting in the BackendConfig.

doyouevensunbro

So, I didn't realize Kubernetes supports gRPC health checks now. Following the document here:

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes...

While this works fine with the deployment, the ingress load balancer health check is still saying unhealthy. Looks like it using a default HTTP/2 health check and not a gRPC one. So, back to square one I guess.

doyouevensunbro

And... now it's working lol I wish I knew what I did that fixed it.

EDIT: Spoke too soon. While everything in the console is showing green still getting a 502 when trying to hit the gRPC app.

oliveiralisson

I've faced a very similar issue this week.

Here goes what I learned to solve my issue.

1. For the Health Check of GRPC applications (at the deployment level on Kubernetes) there is a recommended approach https://cloud.google.com/blog/topics/developers-practitioners/health-checking-your-grpc-servers-gke

2. The Service must have the annotation

cloud.google.com/app-protocols: '{"grpc-api-port":"HTTP2"}'

3. The Ingress will automagically create a second Health Check for the L7 LoadBalancer (you can check yours at https://console.cloud.google.com/compute/healthChecks). You will see that the default is to try to make an HTTP2 request to the path "/" at the port defined on the Service. So your application must be able to respond to.

4. GRPC enforces the use of TLS. And the TLS is also used to determine whether the connection uses HTTP/1.1 or HTTP/2. So to make my application (AspNet Core 7) work correctly, responding to http/2 GRPC and to the http/1.1 GET method, I had to install a self-signed certificate into the container. (Regarding this, you will find a better explanation on https://learn.microsoft.com/en-us/aspnet/core/grpc/aspnetcore?view=aspnetcore-7.0&tabs=visual-studio check "Protocol Negotiation").

Hope this helps you.

doyouevensunbro

Thank you, I turned my self signed certs to test out the built in grpc health check but let me turn that back on.

doyouevensunbro

Ok, enabling the certs fixed the LB timeout issue. Shame that we can't use the built in gRPC readiness and liveness probes since those don't seem to work in insecure mode. Thank you for the help!

Struggling to figure out why the ingress health check is failing