Announcements
The Google Cloud Community will be in read-only from July 16 - July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

GKE MultiCluster Services - inconsistent behaviour with Service Exports

Cluster versions: 1.26.9

We've had two GKE clusters set up in a fleet on a Shared VPC for some months, but in the last few weeks we are seeing very inconsistent behaviour when exporting services. Sometimes it works as expected, other times the ServiceImport and corresponding gke-mcs-x clusterIP service is created in the target cluster, but the service contains zero endpoints. 

In some cases the service will receive endpoints over 24 hours later, in other cases it is never updated. There are no logs in the gke-mcs-importer pod that suggest anything untoward, nor does restarting that service help.

This issue appears to be worse when the service to be exported already has NEGs configured (i.e. annotated with:

cloud.google.com/neg: '{"ingress": true}'

Is this supposed to be supported?

Does anyone have any other ideas on troubleshooting steps?

1 4 922
4 REPLIES 4

RonEtch
Former Googler

Hi @dwilliams782 

Welcome to Google Cloud Community!

In this case, you may want to restart your gke-mcs-importer pod, this sometimes can resolve the issue but not a guaranteed solution. Also, services (in GKE) that are backed by network endpoint groups should be running in a VPC native setup. You may also want to check this guide for the restrictions about clusters that are not VPC native.

For an official documentation guide, you may want to revisit Configuring multi-cluster Services.

You can check this article for additional help or information that might be related to your issue and help you to resolve it.

I hope this information is helpful.

If you need further assistance, you can always file a ticket on our support team.

Hi, thanks for the response! All fleet clusters are VPC native, and restarting the importer pod doesn't help. 

That stackoverflow link shows an exported: false status, our services always show exported: true, they just don't get endpoints, so it's not the same case unfortunately. None of the other limitations apply to us in this case either.

Hi,

 

I have met the very same issue, second cluster cannot create the endpoints, without any error logging.

 

Any new clue on that?

Hi, yes, the issue was with having readinessProbes with incorrect interval and timeout seconds. I published all the details on our tech blog, right at the bottom: https://tech.loveholidays.com/gke-multi-cluster-services-one-bad-probe-away-from-disaster-62051fafe8...

Top Labels in this Space