Apigee Hybrid Infrastructure Monitoring Metrics

dknezic · ‎04-12-2022

Apigee is a platform for developing and managing API proxies that features a hybrid deployment model. The hybrid model includes a management plane hosted by Apigee in Google Cloud and a runtime plane that you install and manage on supported Kubernetes platforms.

As part of managing the runtime plane, monitoring is an important aspect to ensure the runtime is operating as expected. For this we can leverage Cloud Monitoring, and here are some guidelines to help you get started with this topic from an infrastructure point of view.

Metrics

Several metrics of the Apigee hybrid runtime can be monitored. They can generally be separated into the following groups: Pod monitoring and Node monitoring

Node monitoring metrics:

Node metrics give an insight into the status and condition of the nodes and can be used to monitor the resource utilization. Some useful metrics to measure node resource utilization, including:

CPU utilization: The fraction of allocatable CPU currently in use on the instance, as well as request and limit utilization.
Memory utilization: The fraction of the allocatable memory that is currently in use on the instance.
Storage: Local ephemeral storage bytes used by the node.
Network bytes received/transmitted by the node.

Pod monitoring metrics:

Metrics for monitoring pods can be separated into three categories:

Kubernetes metrics

Pod count: Actual/desired number of pods
Pod volume utilization: The fraction of the volume that is currently being used by the instance
Pod request latency

Container metrics

CPU utilization: The fraction of CPU request and limit utilization
Memory limit utilization: The fraction of the memory limit that is currently in use on the instance
Restart count: Number of times the container has restarted

Application metrics

Apigee hybrid generates many metrics that can be used to monitor the runtime components.

Monitoring

Metrics generated and collected by the hybrid runtime are sent to Cloud Monitoring, where you can visualize them and monitor the health of the system.

Use Monitoring Dashboards, Alerts and Notifications to:

View and analyze metric data using predefined dashboards for the resources and services that you use.
Create custom dashboards to analyze Apigee hybrid metrics by creating charts for these metrics.
Create alerts using policies with hybrid runtime metrics based on threshold conditions.
Create notifications based on alerts to take action when they are triggered.
Create Service Level Objectives(SLO) charts.

Basic Metrics for Apigee hybrid Infrastructure Monitoring:

Metrics Resource Type	Example Relevant Containers	Metrics	Metrics Description
k8s_container	Istio-ingressgateway Apigee-runtime Apigee-cassandra Apigee-redis apigee-redis-envoy	kubernetes.io/container/cpu/request_utilization	The fraction of the requested CPU that is currently in use on the instance. This value can be greater than 1 as usage can exceed the request Note: The Apigee overrides for the runtime component has a default cpu request of 500m
k8s_container	Apigee-redis Apigee-redis-envoy Apigee-runtime Istio-ingressgateway	kubernetes.io/container/memory/limit_utilization	The fraction of the memory limit that is currently in use on the instance. This value cannot exceed 1 as usage cannot exceed the limit.
k8s_container		kubernetes.io/container/restart_count	Number of times the container has restarted.
k8s_pod	Istio-ingressgateway Apigee-runtime	kubernetes.io/pod/network/received_bytes_count	Cumulative number of bytes received by the pod over the network.
k8s_pod	Istio-ingressgateway Apigee-runtime	kubernetes.io/pod/network/sent_bytes_count	Cumulative number of bytes transmitted by the pod over the network.
k8s_pod		istio.io/service/client/request_count	Number of requests handled by an Istio proxy (Ingress gateway)
k8s_pod		istio.io/service/client/roundtrip_latencies	Distribution of outgoing requests round trip latency from the service.
k8s_node		node/memory/allocatable_utilization	The fraction of the allocatable memory that is currently in use on the instance. This value cannot exceed 1 as usage cannot exceed allocatable memory bytes.
k8s_node		node/cpu/allocatable_utilization	The fraction of the allocatable CPU that is currently in use on the instance.

Apigee hybrid runtime architecture

Note the above components on the critical path for API processing - components on this path in an unhealthy state will impact the processing of API requests.

A preconfigured sample Apigee Cluster dashboard is also available within the Google Cloud Console's Cloud Monitoring Sample dashboards.

Cloud Monitoring Apigee Sample Dashboards

Apigee Cluster Monitoring Sample Dashboard

Sample Metrics configuration with “Filters” and “Group by” Options:

Further resources

If you're also interested in Apigee API Proxy based monitoring, this documentation covers Alerting and Monitoring configuration approach based on Apigee API Proxy metrics.

For Cassandra, this article covers suggestions specific to Cassandra monitoring and alerting.

Complete list of Kubernetes metrics and definitions can be found at https://cloud.google.com/monitoring/api/metrics_kubernetes

Thanks to Abirami Balasubramanian, Kamaljit Singh, Andy Trickett and Omid Tahouri for input, collaboration and review.

aakashsharmaa5 · ‎09-26-2023

hi, Please clarify -

"Several metrics of the Apigee hybrid runtime can be monitored" - are these metrics available to passed over to 3rd party tools (new relic, data dog etc.) or these 3rd party tools need to be setup with their own configurations from scratch to be populated with these kind of metrics. How will these tools be configured to capture apigee specific metrics such as proxyv2request_count, UDCA specific etc. Please share some thoughts. thx