Apigee Hybrid Infrastructure Monitoring Metrics

Apigee is a platform for developing and managing API proxies that features a hybrid deployment model. The hybrid model includes a management plane hosted by Apigee in Google Cloud and a runtime plane that you install and manage on supported Kubernetes platforms.

As part of managing the runtime plane, monitoring is an important aspect to ensure the runtime is operating as expected. For this we can leverage Cloud Monitoring, and here are some guidelines to help you get started with this topic from an infrastructure point of view.

Metrics

Several metrics of the Apigee hybrid runtime can be monitored. They can generally be separated into the following groups: Pod monitoring and Node monitoring

 

Node monitoring metrics:

Node metrics give an insight into the status and condition of the nodes and can be used to monitor the resource utilization. Some useful metrics to measure node resource utilization, including: 

  • CPU utilization: The fraction of allocatable CPU currently in use on the instance, as well as request and limit utilization. 
  • Memory utilization: The fraction of the allocatable memory that is currently in use on the instance. 
  • Storage: Local ephemeral storage bytes used by the node. 
  • Network bytes received/transmitted by the node.

 

Pod monitoring metrics:

 Metrics for monitoring pods can be separated into three categories: 

  • Kubernetes metrics 
    • Pod count: Actual/desired number of pods 
    • Pod volume utilization: The fraction of the volume that is currently being used by the instance 
    • Pod request latency 
  • Container metrics 
    • CPU utilization: The fraction of CPU request and limit utilization 
    • Memory limit utilization: The fraction of the memory limit that is currently in use on the instance 
    • Restart count: Number of times the container has restarted 
  • Application metrics 
    • Apigee hybrid generates many metrics that can be used to monitor the runtime components.

 

Monitoring

Metrics generated and collected by the hybrid runtime are sent to Cloud Monitoring, where you can visualize them and monitor the health of the system. 

 

Use Monitoring Dashboards, Alerts and Notifications to: 

  • View and analyze metric data using predefined dashboards for the resources and services that you use. 
  • Create custom dashboards to analyze Apigee hybrid metrics by creating charts for these metrics. 
  • Create alerts using policies with hybrid runtime metrics based on threshold conditions. 
  • Create notifications based on alerts to take action when they are triggered.
  • Create Service Level Objectives(SLO) charts.

 

Basic Metrics for Apigee hybrid Infrastructure Monitoring:

 

Metrics Resource Type

Example Relevant Containers

Metrics 

Metrics Description

k8s_container

Istio-ingressgateway

Apigee-runtime

Apigee-cassandra

Apigee-redis

apigee-redis-envoy

kubernetes.io/container/cpu/request_utilization

The fraction of the requested CPU that is currently in use on the instance. This value can be greater than 1 as usage can exceed the request


Note: The Apigee overrides for the runtime component has a default cpu request of 500m

k8s_container

Apigee-redis

Apigee-redis-envoy

Apigee-runtime

Istio-ingressgateway

kubernetes.io/container/memory/limit_utilization

The fraction of the memory limit that is currently in use on the instance. This value cannot exceed 1 as usage cannot exceed the limit.

k8s_container

 

kubernetes.io/container/restart_count

Number of times the container has restarted.

k8s_pod

Istio-ingressgateway

Apigee-runtime

kubernetes.io/pod/network/received_bytes_count

Cumulative number of bytes received by the pod over the network.

k8s_pod

Istio-ingressgateway

Apigee-runtime

kubernetes.io/pod/network/sent_bytes_count

Cumulative number of bytes transmitted by the pod over the network.

k8s_pod

 

istio.io/service/client/request_count

Number of requests handled by an Istio proxy (Ingress gateway)

k8s_pod

 

istio.io/service/client/roundtrip_latencies

Distribution of outgoing requests round trip latency from the service.

k8s_node

 

node/memory/allocatable_utilization

The fraction of the allocatable memory that is currently in use on the instance. This value cannot exceed 1 as usage cannot exceed allocatable memory bytes.

k8s_node

 

node/cpu/allocatable_utilization

The fraction of the allocatable CPU that is currently in use on the instance.



dknezic_0-1649818820165.png

Apigee hybrid runtime architecture

 

Note the above components on the critical path for API processing - components on this path in an unhealthy state will impact the processing of API requests. 

A preconfigured sample Apigee Cluster dashboard is also available within the Google Cloud Console's Cloud Monitoring Sample dashboards.

 

dknezic_1-1649818820127.png

Cloud Monitoring Apigee Sample Dashboards

 
dknezic_2-1649818820164.png

Apigee Cluster Monitoring Sample Dashboard

 

Sample Metrics configuration with “Filters” and “Group by” Options:

dknezic_3-1649818820186.png

 

Further resources

If you're also interested in Apigee API Proxy based monitoring, this documentation covers Alerting and Monitoring configuration approach based on Apigee API Proxy metrics.

 

For Cassandra, this article covers suggestions specific  to Cassandra monitoring and alerting.

 

Complete list of Kubernetes metrics and definitions can be found at https://cloud.google.com/monitoring/api/metrics_kubernetes

 

Thanks to Abirami Balasubramanian, Kamaljit Singh, Andy Trickett and Omid Tahouri for input, collaboration and review.

Contributors
Comments
aakashsharmaa5
Bronze 5
Bronze 5

hi, Please clarify -

"Several metrics of the Apigee hybrid runtime can be monitored" - are these metrics available to passed over to 3rd party tools (new relic, data dog etc.) or these 3rd party tools need to be setup with their own configurations from scratch to be populated with these kind of metrics. How will these tools be configured to capture apigee specific metrics such as proxyv2request_count, UDCA specific etc. Please share some thoughts. thx

Version history
Last update:
‎04-12-2022 08:09 PM
Updated by: