I am trying to use MQL query to create monitoring dashboard for list of VMs. I was able to get the info i needed fgor cpu but for memory I only see
compute.googleapis.com/instance/memory/balloon/ram_size
ram_used
swap_in_bytes_count
swap_out_bytes_count
any idea how i can get actual memory utilization on the VM?
Hi
This is because out of the box the metrics provided are what the hypervisor sees. If you want to see in guest metrics from within the virtual machine, then you'll need to deploy the Google Cloud Ops Agent:
https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent
https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/installation
This will then provide a wealth of additional metrics from within the operating system, including memory
https://cloud.google.com/monitoring/api/metrics_opsagent#agent-memory
But also as you'll see on the above page, it supports a range of application integrations too for even more insights.
Hope that helps,
Alex
hi I tried but the agent says
Pending: Ops Agent is installing.
google-cloud-ops-agent.service - Google Cloud Ops Agent
Loaded: loaded (/usr/lib/systemd/system/google-cloud-ops-agent.service; enabled; vendor preset: enabled)
Active: active (exited) since Mon 2024-07-15 21:51:03 UTC; 9s ago
Process: 1772 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Process: 1765 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -in /etc/google-cloud-ops-agent/config.yaml (code=exited, status=0/SUCCESS)
Main PID: 1772 (code=exited, status=0/SUCCESS)
Tasks: 0 (limit: 21983)
Memory: 0B
CGroup: /system.slice/google-cloud-ops-agent.service
Jul 15 21:50:57 test google_cloud_ops_agent_engine[1765]: pipelines:
Jul 15 21:50:57 test google_cloud_ops_agent_engine[1765]: default_pipeline:
Jul 15 21:50:57 test google_cloud_ops_agent_engine[1765]: receivers: [hostmetrics]
Jul 15 21:50:57 test google_cloud_ops_agent_engine[1765]: processors: [metrics_filter]
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1765]: 2024/07/15 21:51:03 [Ports Check] Result: PASS
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1765]: 2024/07/15 21:51:03 [Network Check] Result: PASS
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1765]: 2024/07/15 21:51:03 [API Check] Result: FAIL, Error code: MonApiUnauthenticatedErr, Failure: The current VM couldn't >
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1765]: 2024/07/15 21:51:03 [API Check] Result: FAIL, Error code: LogApiUnauthenticatedErr, Failure: The current VM couldn't >
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1765]: 2024/07/15 21:51:03 Startup checks finished
Jul 15 21:51:03 test systemd[1]: Started Google Cloud Ops Agent.
● google-cloud-ops-agent-opentelemetry-collector.service - Google Cloud Ops Agent - Metrics Agent
Loaded: loaded (/usr/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service; static; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/google-cloud-ops-agent-opentelemetry-collector.service.d
└─directories.conf
Active: active (running) since Mon 2024-07-15 21:51:03 UTC; 9s ago
Process: 1783 ExecStartPre=/bin/mkdir -p ${RUNTIME_DIRECTORY} ${STATE_DIRECTORY} ${LOGS_DIRECTORY} (code=exited, status=0/SUCCESS)
Process: 1774 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=otel -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRECTORY>
Main PID: 1784 (otelopscol)
Tasks: 6 (limit: 21983)
Memory: 32.6M
CGroup: /system.slice/google-cloud-ops-agent-opentelemetry-collector.service
└─1784 /opt/google-cloud-ops-agent/subagents/opentelemetry-collector/otelopscol --config=/run/google-cloud-ops-agent-opentelemetry-collector/otel.yaml
Jul 15 21:51:04 test otelopscol[1784]: 2024-07-15T21:51:04.129Z info prometheusreceiver@v0.102.0/metrics_receiver.go:257 Scrape job added {"jobName>
Jul 15 21:51:04 test otelopscol[1784]: 2024-07-15T21:51:04.129Z info service@v0.102.0/service.go:206 Everything is ready. Begin running and processing dat>
Jul 15 21:51:04 test otelopscol[1784]: 2024-07-15T21:51:04.130Z info prometheusreceiver@v0.102.0/metrics_receiver.go:344 Starting scrape manager
Jul 15 21:51:05 test otelopscol[1784]: 2024-07-15T21:51:05.181Z error exporterhelper/queue_sender.go:101 Exporting failed. Dropping data. {"error":>
Jul 15 21:51:05 test otelopscol[1784]: go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1
Jul 15 21:51:05 test otelopscol[1784]: /root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.102.0/exporterhelper/queue_sender.go:101
Jul 15 21:51:05 test otelopscol[1784]: go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume
Jul 15 21:51:05 test otelopscol[1784]: /root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.102.0/internal/queue/bounded_memory_queue.go:52
Jul 15 21:51:05 test otelopscol[1784]: go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1
Jul 15 21:51:05 test otelopscol[1784]: /root/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.102.0/internal/queue/consumers.go:43
● google-cloud-ops-agent-fluent-bit.service - Google Cloud Ops Agent - Logging Agent
Loaded: loaded (/usr/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service; static; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/google-cloud-ops-agent-fluent-bit.service.d
└─directories.conf
Active: active (running) since Mon 2024-07-15 21:51:04 UTC; 9s ago
Process: 1792 ExecStartPre=/bin/mkdir -p ${RUNTIME_DIRECTORY} ${STATE_DIRECTORY} ${LOGS_DIRECTORY}/subagents (code=exited, status=0/SUCCESS)
Process: 1773 ExecStartPre=/opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_engine -service=fluentbit -in /etc/google-cloud-ops-agent/config.yaml -logs ${LOGS_DIRE>
Main PID: 1793 (google_cloud_op)
Tasks: 29 (limit: 21983)
Memory: 31.3M
CGroup: /system.slice/google-cloud-ops-agent-fluent-bit.service
├─1793 /opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_wrapper -config_path /etc/google-cloud-ops-agent/config.yaml -log_path /var/log/google-cloud-ops-a>
└─1798 /opt/google-cloud-ops-agent/subagents/fluent-bit/bin/fluent-bit --config /run/google-cloud-ops-agent-fluent-bit/fluent_bit_main.conf --parser /run/google-clo>
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: processors:
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: metrics_filter:
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: type: exclude_metrics
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: metrics_pattern: []
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: service:
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: pipelines:
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: default_pipeline:
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: receivers: [hostmetrics]
Jul 15 21:51:03 test google_cloud_ops_agent_engine[1773]: processors: [metrics_filter]
Jul 15 21:51:04 test systemd[1]: Started Google Cloud Ops Agent - Logging Agent.
● google-cloud-ops-agent-diagnostics.service - Google Cloud Ops Agent - Diagnostics
Loaded: loaded (/usr/lib/systemd/system/google-cloud-ops-agent-diagnostics.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2024-07-15 21:50:53 UTC; 20s ago
Main PID: 1757 (google_cloud_op)
Tasks: 6 (limit: 21983)
Memory: 21.8M
CGroup: /system.slice/google-cloud-ops-agent-diagnostics.service
└─1757 /opt/google-cloud-ops-agent/libexec/google_cloud_ops_agent_diagnostics -config /etc/google-cloud-ops-agent/config.yaml
Jul 15 21:50:53 test systemd[1]: google-cloud-ops-agent-diagnostics.service: Succeeded.
Jul 15 21:50:53 test systemd[1]: Stopped Google Cloud Ops Agent - Diagnostics.
Jul 15 21:50:53 test systemd[1]: Started Google Cloud Ops Agent - Diagnostics.
Jul 15 21:51:03 test google_cloud_ops_agent_diagnostics[1757]: 2024/07/15 21:51:03 rpc error: code = Unauthenticated desc = transport: per-RPC creds failed due to error: metad>
Jul 15 21:51:03 test google_cloud_ops_agent_diagnostics[1757]: 2024/07/15 21:51:03 rpc error: code = Unauthenticated desc = transport: per-RPC cr
My assumption from the log is that the VM doesn't have permission to write telemetry data, have a read through this page for more details on this point:
https://cloud.google.com/monitoring/agent/ops-agent/authorization
i have the ops agent installed. but from my monitoring project when i do mql query on the instance I am still seeing the same stats
compute.googleapis.com/instance/memory/balloon/ram_size
ram_used
swap_in_bytes_count
swap_out_bytes_count
any idea how to get memory used?
Have a look under the agent metrics, check out:
https://cloud.google.com/monitoring/api/metrics_opsagent#agent-memory
tried doing mql query but I dont see a filter for resource.instance_name
only resource.instance_id which is hard to identify what server I am working with
You can do something like:
fetch gce_instance
| metric 'agent.googleapis.com/memory/percent_used'
| filter (resource.project_id == 'my_project_name')
| filter (metadata.system_labels.name == 'my_instance_name')
| filter (resource.zone == 'my_zone')
| filter (metric.state == 'used')
| group_by 2m, [value_percent_used_mean: mean(value.percent_used)]
| every 2m
i tried this but it says no data is available for the selected time frame
What have you tried? Have you confirmed there was data?