Hi All
We have a test Kubernetes setup using Preemptible Nodes.
We are finding that when the nodes are preempted the running POD services no longer respond to Pub Sub Messages, api requests and our web socket after the node is preempted.
To return the backend POD services back to operation we need to restart the deployments each time.
Looking for some guidance if this normal behavior, is there a way to have the PODS restart back to operation after preempting.
Will moving to SPOT nodes eliminate this issue.
Any guidance on the best way to run preemptable nodes and have them restore back to normal operation when they are preempted.
Thanks
Bill
Solved! Go to Solution.
Hi, @bills001.
Could you please tell that whether readiness and liveness probes have been added to the pods? If they have, are you able to see any logs related to them? If not, could you please try adding them?
Regards,
Mokit
Can you provide a bit more detail on this? Are you saying that after preemption, the pods are not getting rescheduled on another node? Or that they are being scheduled on new node(s) and startup, but can't receive traffic?
Hi Garisingh thank you for your response.
The workloads are restarting and we are only running one node (development setup) but our backend application running on the docker containers inside the PODS no longer responds to api request such as logins or third party api request and messages from google pub/sub to trigger backend application functions.
To restart the applications and have them process requests we run the following commands and after this the applications then start responding
Hi, @bills001.
Could you please tell that whether readiness and liveness probes have been added to the pods? If they have, are you able to see any logs related to them? If not, could you please try adding them?
Regards,
Mokit
Thanks Mokit
We will setup our application with a http probe check and setup liveliness probes to detect if the backend is not responding so that the liveliness probe can restart the PODS. Will update you once we test this.
Hi Mokit
We did some testing with the Liveliness probe and it helped us identity the issue was due to the applicaiton needing to define a mongo connection pool, we suspect when the application is restarted it requires additional resources to connect to the DB as a single connection might not be long live enough to connect.
Once we made the change the application is working when it is restarted, we have however kept the liveliness and readiness probes in place to help with alert reporting if the application fails
Thank you for your assistance
I'm glad to hear that the problem has been resolved.