Solved: Google Kubernetes Preemptible Nodes

bills001 · 01-30-2025 01:30 AM

Hi All

We have a test Kubernetes setup using Preemptible Nodes.

We are finding that when the nodes are preempted the running POD services no longer respond to Pub Sub Messages, api requests and our web socket after the node is preempted.

To return the backend POD services back to operation we need to restart the deployments each time.

Looking for some guidance if this normal behavior, is there a way to have the PODS restart back to operation after preempting.

Will moving to SPOT nodes eliminate this issue.

Any guidance on the best way to run preemptable nodes and have them restore back to normal operation when they are preempted.

Thanks

Bill

mokit

Hi, @bills001.

Could you please tell that whether readiness and liveness probes have been added to the pods? If they have, are you able to see any logs related to them? If not, could you please try adding them?

Regards,
Mokit

View solution in original post

garisingh

Can you provide a bit more detail on this? Are you saying that after preemption, the pods are not getting rescheduled on another node? Or that they are being scheduled on new node(s) and startup, but can't receive traffic?

bills001

Hi Garisingh thank you for your response.

The workloads are restarting and we are only running one node (development setup) but our backend application running on the docker containers inside the PODS no longer responds to api request such as logins or third party api request and messages from google pub/sub to trigger backend application functions.

To restart the applications and have them process requests we run the following commands and after this the applications then start responding

kubectl rollout restart deployment backend

kubectl rollout restart deployment ledger

kubectl rollout restart deployment notification

kubectl rollout restart deployment adminbro

kubectl rollout restart deployment reporting

kubectl rollout restart deployment admin-backend

mokit

Hi, @bills001.

Could you please tell that whether readiness and liveness probes have been added to the pods? If they have, are you able to see any logs related to them? If not, could you please try adding them?

Regards,
Mokit

bills001

Thanks Mokit

We will setup our application with a http probe check and setup liveliness probes to detect if the backend is not responding so that the liveliness probe can restart the PODS. Will update you once we test this.

bills001

Hi Mokit

We did some testing with the Liveliness probe and it helped us identity the issue was due to the applicaiton needing to define a mongo connection pool, we suspect when the application is restarted it requires additional resources to connect to the DB as a single connection might not be long live enough to connect.

Once we made the change the application is working when it is restarted, we have however kept the liveliness and readiness probes in place to help with alert reporting if the application fails

Thank you for your assistance

mokit

I'm glad to hear that the problem has been resolved.