So, I have checked with port forwarding the pod and the service and the health check is returning 200 OK. But the backend service for my ingress is reporting UNHEALTHY. Things to note:
Here are my configs:
Deployment
Name: api
Namespace: agones-dev
CreationTimestamp: Wed, 09 Aug 2023 15:04:47 -0700
Labels: <none>
Annotations: deployment.kubernetes.io/revision: 2
Selector: app=api
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=api
Containers:
api:
Image: us-west2-docker.pkg.dev/agones-test-394122/api-testing/gateway:latest
Ports: 8443/TCP, 9000/TCP
Host Ports: 0/TCP, 0/TCP
Liveness: http-get http://:9000/actuator/health delay=15s timeout=2s period=10s #success=1 #failure=3
Readiness: http-get http://:9000/actuator/health delay=15s timeout=2s period=10s #success=2 #failure=3
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: api-598ccb9f5f (0/0 replicas created)
NewReplicaSet: api-7bbdf7db56 (2/2 replicas created)
Events: <none>
Service
Name: api
Namespace: agones-dev
Labels: <none>
Annotations: cloud.google.com/app-protocols: {"grpc":"HTTP2"}
cloud.google.com/backend-config: {"default": "api-backend-config"}
Selector: app=api
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.1.13.1
IPs: 10.1.13.1
Port: grpc 443/TCP
TargetPort: 8443/TCP
NodePort: grpc 30474/TCP
Endpoints: 10.1.0.14:8443,10.1.1.11:8443
Port: health 9000/TCP
TargetPort: 9000/TCP
NodePort: health 30037/TCP
Endpoints: 10.1.0.14:9000,10.1.1.11:9000
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Backend Config
Name: api-backend-config
Namespace: agones-dev
Labels: <none>
Annotations: <none>
API Version: cloud.google.com/v1
Kind: BackendConfig
Metadata:
Creation Timestamp: 2023-08-10T04:02:56Z
Generation: 2
Resource Version: 296973
UID: b28786a9-e41c-4303-af4e-1260eacbbdc6
Spec:
Health Check:
Check Interval Sec: 5
Healthy Threshold: 1
Port: 9000
Request Path: /actuator/health
Timeout Sec: 5
Type: HTTP
Unhealthy Threshold: 5
Logging:
Enable: true
Events: <none>
Ingress
Name: api-ingress
Labels: <none>
Namespace: agones-dev
Address: *.*.*.44
Ingress Class: <none>
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
*
/* api:443 (10.1.0.14:8443,10.1.1.11:8443)
Annotations: ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-14599b3e-bdcc-4bae-90fe-d1e09031b62e
ingress.kubernetes.io/backends: {"k8s-be-30474--5176d82132b3926f":"UNHEALTHY","k8s-be-32628--5176d82132b3926f":"HEALTHY"}
ingress.kubernetes.io/forwarding-rule: k8s2-fr-qum271i5-agones-dev-api-ingress-6kqm9ce1
ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-qum271i5-agones-dev-api-ingress-6kqm9ce1
ingress.kubernetes.io/https-target-proxy: k8s2-ts-qum271i5-agones-dev-api-ingress-6kqm9ce1
ingress.kubernetes.io/ssl-cert: mcrt-14599b3e-bdcc-4bae-90fe-d1e09031b62e
ingress.kubernetes.io/target-proxy: k8s2-tp-qum271i5-agones-dev-api-ingress-6kqm9ce1
ingress.kubernetes.io/url-map: k8s2-um-qum271i5-agones-dev-api-ingress-6kqm9ce1
kubernetes.io/ingress.global-static-ip-name: api-failedmechanicsgames-com-ip
networking.gke.io/managed-certificates: subdomain-ssl-certificate
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 7m13s loadbalancer-controller UrlMap "k8s2-um-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
Normal Sync 7m9s loadbalancer-controller TargetProxy "k8s2-tp-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
Normal Sync 6m55s loadbalancer-controller ForwardingRule "k8s2-fr-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
Normal IPChanged 6m55s loadbalancer-controller IP is now *.*.*.44
Normal Sync 6m46s loadbalancer-controller TargetProxy "k8s2-ts-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
Normal Sync 6m33s loadbalancer-controller ForwardingRule "k8s2-fs-qum271i5-agones-dev-api-ingress-6kqm9ce1" created
Normal Sync 6m30s (x7 over 9m23s) loadbalancer-controller Scheduled for sync
The weird thing is that this worked yesterday and now UNHEALTHY. I can't find logs to help diagnose anywhere. Thanks for any help in advance.
Solved! Go to Solution.
I've faced a very similar issue this week.
Here goes what I learned to solve my issue.
1. For the Health Check of GRPC applications (at the deployment level on Kubernetes) there is a recommended approach https://cloud.google.com/blog/topics/developers-practitioners/health-checking-your-grpc-servers-gke
2. The Service must have the annotation
cloud.google.com/app-protocols: '{"grpc-api-port":"HTTP2"}'
3. The Ingress will automagically create a second Health Check for the L7 LoadBalancer (you can check yours at https://console.cloud.google.com/compute/healthChecks). You will see that the default is to try to make an HTTP2 request to the path "/" at the port defined on the Service. So your application must be able to respond to.
4. GRPC enforces the use of TLS. And the TLS is also used to determine whether the connection uses HTTP/1.1 or HTTP/2. So to make my application (AspNet Core 7) work correctly, responding to http/2 GRPC and to the http/1.1 GET method, I had to install a self-signed certificate into the container. (Regarding this, you will find a better explanation on https://learn.microsoft.com/en-us/aspnet/core/grpc/aspnetcore?view=aspnetcore-7.0&tabs=visual-studio check "Protocol Negotiation").
Hope this helps you.
If I remove the following annotation:
cloud.google.com/app-protocols: {"grpc":"HTTP2"}
The health check works but the app doesn't get HTTP/2 requests. It feels like the health check is ignoring the HTTP setting in the BackendConfig.
So, I didn't realize Kubernetes supports gRPC health checks now. Following the document here:
While this works fine with the deployment, the ingress load balancer health check is still saying unhealthy. Looks like it using a default HTTP/2 health check and not a gRPC one. So, back to square one I guess.
And... now it's working lol I wish I knew what I did that fixed it.
EDIT: Spoke too soon. While everything in the console is showing green still getting a 502 when trying to hit the gRPC app.
I've faced a very similar issue this week.
Here goes what I learned to solve my issue.
1. For the Health Check of GRPC applications (at the deployment level on Kubernetes) there is a recommended approach https://cloud.google.com/blog/topics/developers-practitioners/health-checking-your-grpc-servers-gke
2. The Service must have the annotation
cloud.google.com/app-protocols: '{"grpc-api-port":"HTTP2"}'
3. The Ingress will automagically create a second Health Check for the L7 LoadBalancer (you can check yours at https://console.cloud.google.com/compute/healthChecks). You will see that the default is to try to make an HTTP2 request to the path "/" at the port defined on the Service. So your application must be able to respond to.
4. GRPC enforces the use of TLS. And the TLS is also used to determine whether the connection uses HTTP/1.1 or HTTP/2. So to make my application (AspNet Core 7) work correctly, responding to http/2 GRPC and to the http/1.1 GET method, I had to install a self-signed certificate into the container. (Regarding this, you will find a better explanation on https://learn.microsoft.com/en-us/aspnet/core/grpc/aspnetcore?view=aspnetcore-7.0&tabs=visual-studio check "Protocol Negotiation").
Hope this helps you.
Thank you, I turned my self signed certs to test out the built in grpc health check but let me turn that back on.
Ok, enabling the certs fixed the LB timeout issue. Shame that we can't use the built in gRPC readiness and liveness probes since those don't seem to work in insecure mode. Thank you for the help!