Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Struggling to figure out why the ingress health check is failing

So, I have checked with port forwarding the pod and the service and the health check is returning 200 OK. But the backend service for my ingress is reporting UNHEALTHY. Things to note:

  • I have a private cluster
  • The main service is an GRPC API using HTTP/2
  • The health check is an HTTP call on another port
  • I'm using a BackendConfig to set up the health check 

Here are my configs:

Deployment

 

Name:                   api
Namespace:              agones-dev
CreationTimestamp:      Wed, 09 Aug 2023 15:04:47 -0700
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               app=api
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=api
  Containers:
   api:
    Image:        us-west2-docker.pkg.dev/agones-test-394122/api-testing/gateway:latest
    Ports:        8443/TCP, 9000/TCP
    Host Ports:   0/TCP, 0/TCP
    Liveness:     http-get http://:9000/actuator/health delay=15s timeout=2s period=10s #success=1 #failure=3
    Readiness:    http-get http://:9000/actuator/health delay=15s timeout=2s period=10s #success=2 #failure=3
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  api-598ccb9f5f (0/0 replicas created)
NewReplicaSet:   api-7bbdf7db56 (2/2 replicas created)
Events:          <none>

 

Service

Name:                     api
Namespace:                agones-dev
Labels:                   <none>
Annotations:              cloud.google.com/app-protocols: {"grpc":"HTTP2"}
                          cloud.google.com/backend-config: {"default": "api-backend-config"}
Selector:                 app=api
Type:                     NodePort
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.1.13.1
IPs:                      10.1.13.1
Port:                     grpc  443/TCP
TargetPort:               8443/TCP
NodePort:                 grpc  30474/TCP
Endpoints:                10.1.0.14:8443,10.1.1.11:8443
Port:                     health  9000/TCP
TargetPort:               9000/TCP
NodePort:                 health  30037/TCP
Endpoints:                10.1.0.14:9000,10.1.1.11:9000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Backend Config

Name:         api-backend-config
Namespace:    agones-dev
Labels:       <none>
Annotations:  <none>
API Version:  cloud.google.com/v1
Kind:         BackendConfig
Metadata:
  Creation Timestamp:  2023-08-10T04:02:56Z
  Generation:          2
  Resource Version:    296973
  UID:                 b28786a9-e41c-4303-af4e-1260eacbbdc6
Spec:
  Health Check:
    Check Interval Sec:   5
    Healthy Threshold:    1
    Port:                 9000
    Request Path:         /actuator/health
    Timeout Sec:          5
    Type:                 HTTP
    Unhealthy Threshold:  5
  Logging:
    Enable:  true
Events:      <none>

Ingress

Name:             api-ingress
Labels:           <none>
Namespace:        agones-dev
Address:          *.*.*.44
Ingress Class:    <none>
Default backend:  <default>
Rules:
  Host        Path  Backends
  ----        ----  --------
  *
              /*   api:443 (10.1.0.14:8443,10.1.1.11:8443)
Annotations:  ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-14599b3e-bdcc-4bae-90fe-d1e09031b62e
              ingress.kubernetes.io/backends: {"k8s-be-30474--5176d82132b3926f":"UNHEALTHY","k8s-be-32628--5176d82132b3926f":"HEALTHY"}
              ingress.kubernetes.io/forwarding-rule: k8s2-fr-qum271i5-agones-dev-api-ingress-6kqm9ce1
              ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-qum271i5-agones-dev-api-ingress-6kqm9ce1
              ingress.kubernetes.io/https-target-proxy: k8s2-ts-qum271i5-agones-dev-api-ingress-6kqm9ce1
              ingress.kubernetes.io/ssl-cert: mcrt-14599b3e-bdcc-4bae-90fe-d1e09031b62e
              ingress.kubernetes.io/target-proxy: k8s2-tp-qum271i5-agones-dev-api-ingress-6kqm9ce1
              ingress.kubernetes.io/url-map: k8s2-um-qum271i5-agones-dev-api-ingress-6kqm9ce1
              kubernetes.io/ingress.global-static-ip-name: api-failedmechanicsgames-com-ip
              networking.gke.io/managed-certificates: subdomain-ssl-certificate
Events:
  Type    Reason     Age                    From                     Message
  ----    ------     ----                   ----                     -------
  Normal  Sync       7m13s                  loadbalancer-controller  UrlMap "k8s2-um-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  Sync       7m9s                   loadbalancer-controller  TargetProxy "k8s2-tp-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  Sync       6m55s                  loadbalancer-controller  ForwardingRule "k8s2-fr-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  IPChanged  6m55s                  loadbalancer-controller  IP is now *.*.*.44
  Normal  Sync       6m46s                  loadbalancer-controller  TargetProxy "k8s2-ts-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  Sync       6m33s                  loadbalancer-controller  ForwardingRule "k8s2-fs-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
  Normal  Sync       6m30s (x7 over 9m23s)  loadbalancer-controller  Scheduled for sync

The weird thing is that this worked yesterday and now UNHEALTHY. I can't find logs to help diagnose anywhere. Thanks for any help in advance. 

Solved Solved
0 6 5,732
1 ACCEPTED SOLUTION

I've faced a very similar issue this week.

Here goes what I learned to solve my issue.

1. For the Health Check of GRPC applications (at the deployment level on Kubernetes) there is a recommended approach  https://cloud.google.com/blog/topics/developers-practitioners/health-checking-your-grpc-servers-gke

2. The Service must have the annotation 

cloud.google.com/app-protocols: '{"grpc-api-port":"HTTP2"}'

3. The Ingress will automagically create a second Health Check for the L7 LoadBalancer (you can check yours at https://console.cloud.google.com/compute/healthChecks). You will see that the default is to try to make an HTTP2 request to the path "/" at the port defined on the Service. So your application must be able to respond to.

4. GRPC enforces the use of TLS. And the TLS is also used to determine whether the connection uses HTTP/1.1 or HTTP/2. So to make my application (AspNet Core 7) work correctly, responding to http/2 GRPC and to the http/1.1 GET method, I had to install a self-signed certificate into the container.  (Regarding this, you will find a better explanation on https://learn.microsoft.com/en-us/aspnet/core/grpc/aspnetcore?view=aspnetcore-7.0&tabs=visual-studio check "Protocol Negotiation").

Hope this helps you.

View solution in original post

6 REPLIES 6

If I remove the following annotation:

cloud.google.com/app-protocols: {"grpc":"HTTP2"}

The health check works but the app doesn't get HTTP/2 requests. It feels like the health check is ignoring the HTTP setting in the BackendConfig. 

 

So, I didn't realize Kubernetes supports gRPC health checks now. Following the document here:

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes...

While this works fine with the deployment, the ingress load balancer health check is still saying unhealthy. Looks like it using a default HTTP/2 health check and not a gRPC one. So, back to square one I guess.

And... now it's working lol I wish I knew what I did that fixed it.

EDIT: Spoke too soon. While everything in the console is showing green still getting a 502 when trying to hit the gRPC app. 

I've faced a very similar issue this week.

Here goes what I learned to solve my issue.

1. For the Health Check of GRPC applications (at the deployment level on Kubernetes) there is a recommended approach  https://cloud.google.com/blog/topics/developers-practitioners/health-checking-your-grpc-servers-gke

2. The Service must have the annotation 

cloud.google.com/app-protocols: '{"grpc-api-port":"HTTP2"}'

3. The Ingress will automagically create a second Health Check for the L7 LoadBalancer (you can check yours at https://console.cloud.google.com/compute/healthChecks). You will see that the default is to try to make an HTTP2 request to the path "/" at the port defined on the Service. So your application must be able to respond to.

4. GRPC enforces the use of TLS. And the TLS is also used to determine whether the connection uses HTTP/1.1 or HTTP/2. So to make my application (AspNet Core 7) work correctly, responding to http/2 GRPC and to the http/1.1 GET method, I had to install a self-signed certificate into the container.  (Regarding this, you will find a better explanation on https://learn.microsoft.com/en-us/aspnet/core/grpc/aspnetcore?view=aspnetcore-7.0&tabs=visual-studio check "Protocol Negotiation").

Hope this helps you.

Thank you, I turned my self signed certs to test out the built in grpc health check but let me turn that back on. 

Ok, enabling the certs fixed the LB timeout issue. Shame that we can't use the built in gRPC readiness and liveness probes since those don't seem to work in insecure mode. Thank you for the help!

Top Labels in this Space
Top Solution Authors