I have used ops agent for non GPU purposes and it is fairly straightforward, however with GPU it is tricky. I am following this documentation and installed the NVIDIA GPU drivers but i am unsure whether VM is reporting GPU metrics / is data getting ingested (i am interested only in metrics no logs). My assumption is that green check mark indicates ops agent is up and running and reporting metrics. Below is what i see from the VM observability tab. I have not executed my AI workload on the VM and I understand that metrics won't appear retrospectively but is it possible to confirm that ops agent setup is fine from the screenshot below. one interesting thing to note that i don't see any activity on ports 20201 and 20202 which are the monitoring and logging port for ops agent (netstat output)
Below are the logs from the VM. Note that i added missing IAM permission so please ignore those errors
Solved! Go to Solution.
Hi @dheerajpanyam ,
Yes — from your screenshot, the green checks and GPU charts confirm that metrics are being collected correctly.
No need to worry about ports 20201/20202 — they’re internal and might not always show in netstat. Once your AI workload runs, GPU usage metrics should appear as expected.
Hi @dheerajpanyam ,
Yes — from your screenshot, the green checks and GPU charts confirm that metrics are being collected correctly.
No need to worry about ports 20201/20202 — they’re internal and might not always show in netstat. Once your AI workload runs, GPU usage metrics should appear as expected.
Thanks @a_aleinikov