How to build a monitoring dashboard for a GKE application using Prometheus and Grafana

arjunvijay · 09-20-2023 05:38 PM

In this article, I'll provide steps to deploy a sample Spring Boot application in Google Kubernetes Engine (GKE), followed by steps to monitor the application metrics on with Managed Prometheus and Grafana. This can be extended to other kinds of applications like Flask in Python, React, etc.

First, we'll go through basic definitions of these individual components and then we'll understand how they'll fit in when they work in tandem.

Key terms and definitions

Google Cloud Managed Service for Prometheus is Google Cloud’s fully managed, multi-cloud, cross-project solution for Prometheus metrics. It lets you globally monitor and alert on your workloads, using Prometheus, without having to manually manage and operate Prometheus at scale.

Managed Service for Prometheus collects metrics from Prometheus exporters and lets you query the data globally using PromQL, meaning that you can keep using any existing Grafana dashboards, PromQL-based alerts, and workflows. It is hybrid- and multi-cloud compatible, can monitor both Kubernetes and VM workloads, retains data for 24 months, and maintains portability by staying compatible with upstream Prometheus. You can also supplement your Prometheus monitoring by querying over 1,500 free metrics in Cloud Monitoring, including free GKE system metrics, using PromQL.

Grafana is a managed service for Prometheus that uses the built-in Prometheus data source for Grafana, meaning that you can keep using any community-created or personal Grafana dashboards without any changes.

The above diagram is a high-level overview of the entire process.

Once we enable the managed collection in the GKE cluster (https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed#enable-mgdcoll-gke), we have the prometheus based collector setup in the gke-gmp-system namespace.
To ingest the metric data emitted by the example application, Managed Service for Prometheus uses target scraping. Target scraping and metrics ingestion are configured using Kubernetes custom resources. The managed service uses PodMonitoring custom resources (CRs).
These collectors scrape the data and call the monitoring ingest api of the monarch system or managed prometheus to send the data for data storage. All Managed Service for Prometheus data is stored for 24 months at no additional cost. Managed Service for Prometheus supports a minimum scrape interval of 5 seconds. Data is stored at full granularity for 1 week, then is downsampled to 1-minute points for the next 5 weeks, then is downsampled to 10-minute points and stored for the remainder of the retention period. Managed Service for Prometheus has no limit on the number of active time series or total time series.
Then there are Prometheus standalone UI and Grafana, which pulls the data into dashboards using the promQL queries. These UI and Grafana can be deployed in the same namespace as our application accessed via the load balancer.

Now let's move on to a step-by-step guide of deploying all the components on Google Cloud.

How to deploy a Spring Boot application with metrics

1. Make sure that the Spring Boot application that you're running has the following dependencies for exposing the metrics. If not, add this:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
  <version>3.1.2</version>
</dependency>
<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
  <version>1.11.2</version>
</dependency>

Actuator exposes some APIs, for health-checking and monitoring of your apps.

The Micrometer registry dependency specifically enables Prometheus support. This allows the metrics collected by Micrometer to be exposed in a Prometheus way. This is required, as we need to expose in the format that Prometheus can understand.

Also, you can add custom metrics/logic like the below. It will automatically be picked.

@RestController
public class WebController {

   Counter visitCounter;

   public WebController(MeterRegistry registry) {
       visitCounter = Counter.builder("visit_counter")
           .description("Number of visits to the site")
           .register(registry);
   }

   @GetMapping("/")
   public String index() {
       visitCounter.increment();
       return "Hello World!";
   }   
}

2. Next we need to tell Spring Boot's actuator which endpoints it should expose. Add the following:

spring:
 application:
   name: spring-prometheus-demo
management:
 endpoints:
   web:
     exposure:
       include: health, metrics, prometheus
 metrics:
   tags:
     application: ${spring.application.name}

3. After this, run the application locally or on Google Cloud with a load balancer to make sure that the Prometheus endpoint (actuator/Prometheus) is working on http://<ip>:8080/actuator/prometheus and giving the following results:

4. Once we verify this locally and have not yet deployed on Google Cloud, we can deploy with the following docker file. Create a Dockerfile for the Spring Boot app. We will run this in the 5th step.

# Use a base image with Java 11 installed
FROM openjdk:11-jre

# Copy the JAR file into the container
COPY application.jar app.jar
COPY src/main/resources/application.yml application.yml
# Expose the port that the application will listen on
EXPOSE 8080
# Set the command to run the application when the container starts
CMD ["java", "-jar","app.jar", "--spring.config.location=application.yml", "-Djava.net.preferIPv4Stack=true"]

5. Now run the following shell script in either Cloud Shell or on your terminal where you've fetched GKE credentials. You can use this method to authenticate. This will deploy the image on Google Artifact Registry, which we can use in our Kubernetes deployment.

#!/bin/sh
mvn clean install
mv ./target/demo-0.0.1-SNAPSHOT.jar ./application.jar
gcloud init
gcloud auth login
gcloud builds submit -t "us-docker.pkg.dev/arjun-demo-123/sample/prometheus-sample" ./

6. Once this is done, we can start the deployment with pod monitoring resources. Below, you'll see the consolidated manifest that you can use for your deployment. Notice the namespace being used here.

apiVersion: apps/v1
kind: Deployment
metadata:
 namespace: gmp-test
 labels:
   app.kubernetes.io/name: prometheus-example-app
 name: prometheus-example-app
spec:
 selector:
   matchLabels:
     app.kubernetes.io/name: prometheus-example-app

 replicas: 2
 template:
   metadata:
     labels:
       app.kubernetes.io/name: prometheus-example-app
   spec:
     nodeSelector:
       kubernetes.io/os: linux
       kubernetes.io/arch: amd64
     containers:
       - image: us-docker.pkg.dev/arjun-demo-123/sample/prometheus-sample:latest
         name: prom-example
         ports:
           - name: metrics
             containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
 name: prometheus-example-app
 namespace: gmp-test
spec:
 selector:
   app.kubernetes.io/name: prometheus-example-app
 ports:
   - protocol: TCP
     port: 80
     targetPort: metrics
 type: LoadBalancer
---
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
 name: prometheus-example-app
 labels:
   app.kubernetes.io/name: prometheus-example-app
spec:
 selector:
   matchLabels:
     app.kubernetes.io/name: prometheus-example-app
 endpoints:
   - port: metrics
     path: /actuator/prometheus
     interval: 30s

Now we can run this using the following command as mentioned in step 5.

kubectl apply -f deployment.yaml

This will deploy the application with pod monitoring resources and scraping will begin from Prometheus endpoints.

Also, check the application Prometheus endpoint on external IP of the Google Cloud load balancer frontend on port 80.

7. Additionally, you can enable target status to check the status of your targets in your PodMonitoring or ClusterPodMonitoring resources by setting the features.targetStatus.enabled value within the OperatorConfig resource to true, as done here.

8. After we've deployed Google Cloud Managed Service for Prometheus, we can query the data sent to the managed service and display the results in charts and dashboards. For this blog, we'll use standalone Prometheus UI and Grafana for visualization.

When running on GKE, Managed Service for Prometheus automatically retrieves credentials from the environment based on the Compute Engine default service account. The default service account has the necessary permissions, monitoring.metricWriter and monitoring.viewer, by default. If you do not use Workload Identity, and you have previously removed either of those roles from the default node service account, you will have to re-add those missing permissions before continuing.

Note: Replace <project_id> and <cluster_name> with your project id as well cluster name.

#!/bin/sh
gcloud init
gcloud auth login
gcloud config set project <project_id>
kubectl config set-cluster <cluster_name>
gcloud config set project <project_id> \
&&
gcloud iam service-accounts create gmp-test-sa \
&&
gcloud iam service-accounts add-iam-policy-binding \
 --role roles/iam.workloadIdentityUser \
 --member "serviceAccount:<project_id>.svc.id.goog[gmp-test/default]" \
 gmp-test-sa@<project_id>.iam.gserviceaccount.com \
&&
kubectl annotate serviceaccount \
 --namespace gmp-test \
 default \
 iam.gke.io/gcp-service-account=gmp-test-sa@<project_id>.iam.gserviceaccount.com
gcloud projects add-iam-policy-binding <project_id> \
 --member=serviceAccount:gmp-test-sa@<project_id>.iam.gserviceaccount.com \
 --role=roles/monitoring.viewer

You can verify this following the information at this link.

Since all the permissions are now set correctly, it's time to setup the Prometheus UI.

To deploy the standalone Prometheus frontend UI for Managed Service for Prometheus, run the following commands:

1. Deploy the frontend service and configure it to query the scoping project of your metrics scope of your choice.

curl https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.7.0/examples/frontend.yaml |
sed 's/\$PROJECT_ID/<project_id>/' |
kubectl apply -n gmp-test -f -

2. Port-forward the frontend service to your local machine. The following example forwards the service to port 9090:

kubectl -n gmp-test port-forward svc/frontend 9090

This command does not return, and while it's running, it reports access to the URL.

If you want to continue using a Grafana deployment installed by kube-prometheus, then deploy the standalone Prometheus frontend UI in the monitoring namespace instead.

You can access the standalone Prometheus frontend UI in your browser at the URL: http://localhost:9090. If you're using Cloud Shell for this step, you can get access by using the Web Preview button.

The following screenshot shows a table in the standalone Prometheus frontend UI that displays the up metric:

10. You can deploy and configure Grafana following the steps in this link. Once you do that, you will get the beautiful Grafana graph as follows: