In this article, we'll show step-by-step how to configure multi-cluster load balancing with GKE (Google Kubernetes Engine) using standalone network endpoint groups (NEGs). Let's first start by explaining key terms.
Forwarding Rule: Each rule is associated with a specific IP and port. Given we’re talking about global HTTP load balancing, this will be anycast global IP address (optionally reserved static IP). The associated port is the port on which the load balancer is ready to accept traffic from external clients. The port can be 80 or 8080 if the target is HTTP proxy, or 443 in case of HTTPS proxy.
Target HTTP(S) Proxy: Traffic is then terminated based on Target Proxy configuration. Each Target Proxy is linked to exactly one URL Map (N:1 relationship). You’ll also have to attach at least one SSL certificate and configure SSL Policy In case of HTTPS proxy.
URL Map: This is the core traffic management component and allows you to route incoming traffic between different Backend Services (including Cloud Storage buckets). Basic routing is hostname and path-based, but more advanced traffic management is possible as well - URL redirects, URL rewriting and header- and query parameter-based routing. Each rule directs traffic to one Backend Service.
Backend Service: This is a logical grouping of backends for the same service and relevant configuration options, such as traffic distribution between individual Backends, protocols, session affinity, or features like Cloud CDN, Cloud Armor, or IAP. Each Backend Service is also associated with a Health Check.
Health Check: Determines how individual backend endpoints are checked for being alive, and this is used to compute overall health state for each Backend. Protocol and port have to be specified when creating one, along with some optional parameters like check interval, healthy and unhealthy thresholds or timeout. Important bit to note is that firewall rules allow health-check traffic from a set of internal IP ranges.
Backend: Represents a group of individual endpoints in a given location. In the case of GKE, our backends will be NEGs. NEGs overview one per each zone of our GKE cluster (in case of GKE NEGs, these are zonal, but some backend types are regional).
Backend Endpoint: A combination of IP address and port, in case of GKE with container-native load balancing.
Now that we've defined key terms, let's walk through the steps to configure multi-cluster load balancing with GKE using standalone NEGs.
1. Create two clusters in two different regions where we want to distribute the load across.
Let's say the load needs to be distributed amongst the Toronto region and Montreal region cluster.
gcloud container clusters create toronto-cluster \
--project [PROJECT_ID] \
--region northamerica-northeast1 \
--num-nodes [NUM_NODES]
gcloud container clusters create montreal-cluster \
--project [PROJECT_ID] \
--region northamerica-northeast1 \
--num-nodes [NUM_NODES]
2. Get the credentials for both clusters and run in separate tabs of terminal.
gcloud container clusters get-credentials montreal-cluster \
--region northamerica-northeast1 \
--project my-project-12345
gcloud container clusters get-credentials toronto-cluster \
--region northamerica-northeast2 \
--project my-project-12345
3. For each terminal (cluster), run the following command starting from this step:
kubectl create deployment hello-world --image=quay.io/stepanstipl/k8s-demo-app:latest
This command deploys one of the docker images, which showcases cluster name, region, etc. on the frontend web page once we open the url on 8080.
Also, it has two pages at /foo and /bar which correspond to two services. This will boil down to creating two backend services.
So basically, the image contains the complete app which has these two endpoints. By the end of this blog, you'll understand how the traffic for foo and bar service will be redirected to different clusters based on RATE as a load balancing policy. We can configure affinity as well, which I will cover in a subsequent tutorial.
4. Create a file called service.yaml and add the following content
apiVersion: v1
kind: Service
metadata:
name: hello-world
annotations:
cloud.google.com/neg: '{"ingress": true, "exposed_ports": {"8080":{}}}'
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 8080
selector:
app: hello-world
type: ClusterIP
Run the following command to expose the deployment with service with the annotation cloud.google.com/neg, which means NEGS will be automatically created.
kubectl apply -f service.yaml
Now that you have the service being exposed, it's time to setup HTTP L7 global load balancer.
5. Do the following steps just once (not like we have done previously starting at step 2).
gcloud compute health-checks create http health-check-foobar \
--use-serving-port \
--request-path="/healthz"
gcloud compute backend-services create backend-service-foo \
--global \
--health-checks health-check-foobar
gcloud compute backend-services create backend-service-bar \
--global \
--health-checks health-check-foobar
gcloud compute backend-services create backend-service-default \
--global
gcloud compute url-maps create foobar-url-map \
--global \
--default-service backend-service-default
gcloud compute url-maps add-path-matcher foobar-url-map \
--global \
--path-matcher-name=foo-bar-matcher \
--default-service=backend-service-default \
--backend-service-path-rules='/foo/*=backend-service-foo,/bar/*=backend-service-bar'
gcloud compute target-http-proxies create foobar-http-proxy \
--global \
--url-map foobar-url-map \
--global-url-map
gcloud compute forwarding-rules create foobar-forwarding-rule \
--global \
--target-http-proxy foobar-http-proxy \
--ports 8080
This will set up the baseline for the global load balancer. Now it's time to tell the load balancer to redirect its load to different NEGs, which have been automatically created by now.
6. Check the NEGs
kubectl get svc \
-o custom-columns='NAME:.metadata.name,NEG:.metadata.annotations.cloud\.google\.com/neg-status'
Note down the NEG name and its associated region under each cluster terminal tabs.
7. Configure the load balancer
gcloud compute backend-services add-backend backend-service-foo \
--global \
--network-endpoint-group k8s1-31be30d1-default-hello-world-8080-640bb695 \
--network-endpoint-group-zone=northamerica-northeast1-c \
--balancing-mode=RATE \
--max-rate-per-endpoint=100
gcloud compute backend-services add-backend backend-service-bar \
--global \
--network-endpoint-group k8s1-31be30d1-default-hello-world-8080-640bb695 \
--network-endpoint-group-zone=northamerica-northeast1-c \
--balancing-mode=RATE \
--max-rate-per-endpoint=100
gcloud compute backend-services add-backend backend-service-foo \
--global \
--network-endpoint-group k8s1-42ea20f2-default-hello-world-8080-693ebc13 \
--network-endpoint-group-zone=northamerica-northeast2-c \
--balancing-mode=RATE \
--max-rate-per-endpoint=100
gcloud compute backend-services add-backend backend-service-bar \
--global \
--network-endpoint-group k8s1-42ea20f2-default-hello-world-8080-693ebc13 \
--network-endpoint-group-zone=northamerica-northeast2-c \
--balancing-mode=RATE \
--max-rate-per-endpoint=100
gcloud compute firewall-rules create fw-allow-gclb \
--network=default \
--action=allow \
--direction=ingress \
--source-ranges=0.0.0.0/0 \
--rules=tcp:8080
Note: In Google Cloud, the balancing-mode parameter determines how the load balancer distributes incoming traffic to the backends. The available balancing modes are:
When adding a backend to a compute backend service using the gcloud command, you can specify the desired balancing mode using the balancing-mode parameter, followed by one of the modes mentioned above.
You've now completely and successfully configured multi-cluster load balancing with GKE using standalone NEGs in Google Cloud.
To test this, use:
http://<external IP of frontend LB>:8080/foo
You can get this <external IP of frontend LB> from the console by visiting load balancing service on the GCP console. Just type Load balancer and go to the tab frontend for your service.
Keep hitting the endpoints continuously to generate traffic and check for the changes on the cluster name. Now bring down one cluster and test it again.