Now and then one of the nodes in my cluster is throwing this error: Container runtime containerd failed!
resulting in all pods to get restarted at once and scheduled in other nodes abruptly.
I checked all metrics and I can;t correlate to any of them causing this.
Hi @surshrestha,
Welcome to the Google Cloud Community!
Can you executing the following commands [1]?
$ sudo rm /etc/containerd/config.toml
$ sudo systemctl restart containerd
$ sudo kubeadm init
Unfortunately, many factors can cause containerd to fail, including a lack of available resources, improper configuration, and so on. If this doesn't work, you might want to contact Google Cloud Support to further investigate your project.
I hope this helps. Thank you.
Best,
Lawrence