Solved: Trouble routing traffic from cloud function via vp...

towe · 12-13-2023 08:55 AM

So I've been beating my head against a brick wall for a while now trying to get a cloud function to call a GKE service.

Our setup:

GKE in a vpc network (not default) in europe-north1 (setup pre-dates this recent attempt to add cloud functions to the mix).

Cloud function (gen1) and vpc access connector in europe-west1. vpc connector in a subnet in same vpc network as above. Cloud function uses the vpc connector for egress and cloud function service account has (among other things the role "Serverless VPC Access User"). Invoke only from private IPs.

By getting a bash prompt on a pod in GKE we can do a curl to a health endpoint in the cloud function which works.

We also have a /test endpoint in the cloud function which makes a GET request to a public API.

If we set the cloud function to only use the vpc connector for egress for internal IPs, the /test endpoint works.

If we set the cloud function to use the vpc connector for all egress, the /test endpoint results in a timeout (same as if we use the "real" CF endpoint that calls a GKE service).

I have (out of a sense of desperation) created a very permissive firewall rule with priority 999 (lowest at the time) to allow all traffic on all ports and protocols to whole network with no effect. I then created a deny rule with priority 900 to deny all traffic from 35.199.224.0/19 (https://cloud.google.com/vpc/docs/configure-serverless-vpc-access#allow-ranges) to target tag "vpc-connector" with logging enabled. I tailed the firewall rule log while making a call to the cloud function /test endpoint (all egress via vpc connector).

The cloud function logs show the timeout but what I find interesting is that there were NO entries in the firewall rule log to show that any traffic to the vpc connector has been denied.

I'm not sure if this is a "valid" test but it does appear to me that traffic from the cloud function doesn't reach the vpc connector.

As I said at the start, I've been beating my head against a wall with this for a couple of days now so any suggestions would be very much appreciated.

towe

Ok, so found the issue and of course I feel somewhat silly although I think it was an easy mistake to make and that documentation could be better. I'll post it here in case someone else has the same problem in the future.

So our setup is that most of our solution is deployed in region A (that doesn't support gen1 cloud functions) so now that we're starting to test moving part of the workload to serverless we had to deploy those to region B. Using gen1 cloud functions because at the time when we started with this there was an issue assigning invoke permissions using terraform for gen 2 cloud functions.

So the documentation says that the vpc connector needs to be in the same region as the cloud function. What I didn't consider was that our GKE services

(which are internal - using type ClusterIP) that I want to be able to communication with from the cloud function are regional and only accessible in region A. They cannot be reached from the vpc connector in region B. So the solution was to change the service type to LoadBalancer and provide one annotation to make the LB internal and another annotation to make the LB global.

View solution in original post

Marvin_Lucero

Hi @towe ,

Thank you for providing a detailed description of your issue. Here are my takeaways for it:

Make sure that the VPC peering between the VPC in europe-west1 (where Cloud Function is deployed) and the VPC in europe-north1 (where GKE is deployed) is properly set up.
Double-check the deny rule for the 35.199.224.0/19 range to see if it is blocking traffic correctly. Also, confirm if the firewall rules allow traffic between the Cloud Function's VPC connector subnet and the GKE cluster.
Check if the Cloud Function is correctly configured to use the VPC connector for egress traffic.
Check if the IP ranges of the subnets in europe-west1 and europe-north1 do not overlap.
Make sure GKE services are correctly exposed and that the necessary ports are open.

If the following areas are properly configured and checked, and issue will still persist, I suggest you create a support case so Cloud Specialists can check your project accordingly.

towe

Hi Marvin and thank you for the reply.

So there is only one VPC and one project. Or rather there is the default VPC and then there is the VPC that we use. This has two subnets. One for the GKE cluster and one for the vpc connector. So no VPC peering. I am able to call the cloud function by doing a curl from a pod in the GKE cluster.

The deny rule doesn't appear to do anything. Or rather when making a call from a pod to the cloud function that should make a call to a public api endpoint I can see the incoming request to the cloud function but following the firewall deny rule logs, I cannot see any rejected traffic even though the cloud function has been configured to use the vpc connector for all egress.

IP ranges of subnets do not overlap.

I will need to get to checking GKE services at some point but we haven't gotten to this point yet. We're still trying to get traffic going from the cloud function through the vpc connector to a public api endpoint.

towe

Ok, so found the issue and of course I feel somewhat silly although I think it was an easy mistake to make and that documentation could be better. I'll post it here in case someone else has the same problem in the future.

So our setup is that most of our solution is deployed in region A (that doesn't support gen1 cloud functions) so now that we're starting to test moving part of the workload to serverless we had to deploy those to region B. Using gen1 cloud functions because at the time when we started with this there was an issue assigning invoke permissions using terraform for gen 2 cloud functions.

So the documentation says that the vpc connector needs to be in the same region as the cloud function. What I didn't consider was that our GKE services

(which are internal - using type ClusterIP) that I want to be able to communication with from the cloud function are regional and only accessible in region A. They cannot be reached from the vpc connector in region B. So the solution was to change the service type to LoadBalancer and provide one annotation to make the LB internal and another annotation to make the LB global.

Trouble routing traffic from cloud function via vpc connector to both internal and external targets