Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

GKE Cronjob metrics stopped across all clusters

I'm trying to investigate a very strange issue on our GKE clusters where metrics for all of our Cronjobs mysteriously stopped working around 22:05 on the 20th April. As far as I have been able to determine so far:

  • the configuration of the clusters wasn't changed at this time
  • the configuration of the cronjobs also wasn't changed around this time
  • the jobs do run on the correct schedule and the logs are still visible

An example of the CronJob deployment details screen for one job shows an abrupt end to the metrics and nothing in the four days since despite the job continuing to run on the same schedule:

Job1-metrics.png

 

It's the same for a completely unrelated job that runs in a different node pool in the same cluster:

Job2-metrics.png

 

Metrics for non Cronjob deployments on the same cluster still work as expected:

Non-cronjob-metrics.png

 

Alerting policies based on log counts for the Cronjobs also stop at the same time despite logs still being written by the containers:

Alerting-metrics.png

As I mentioned the issue is affecting all CronJobs across multiple clusters with the only commonality being:

  • they are in the same project
  • the Kubernetes versions are the same (control plane: 1.24.10-gke.2300, node pools: 1.23.11-gke.300)

It's possible that the issue is being caused by the difference in version between the control plane and nodes, however I can't see anything in the logs that suggests the control plane was updated at that time. Any other suggestions welcome.

0 2 435
2 REPLIES 2

Hello george-blis,

Welcome to Google Cloud Community!

It is guaranteed that control planes are compatible with nodes up to two minor versions older than the control plane. For example, GKE 1.23 control planes are compatible with GKE 1.21 nodes.
See Kubernetes version and version skew support policy

To further inspect your project, it would be best to be in touch with the Cloud Platform Support.
https://cloud.google.com/contact

Hi Willbin,
Thanks for the welcome and the confirmation regarding the Kubernetes version skew. Unfortunately my organisation only has basic support so I can only raise tickets relating to billing support and I get directed here to the community support channel if I try to raise anything else.

Top Labels in this Space
Top Solution Authors