GPU access for monitoring?

njw

With a GKE cluster with nodes with attached GPUs, what are the ways to get access to the GPUs outside of the resource request mechanism? I want to run daemonsets that monitor the GPUs, even when there's a workload, so they shouldn't actually consume the GPU (Akin to the DCGM monitoring tool). It seems that I can do it by setting `securityContext.privileged: true` on a container, but is there a lesser permission grant that will provide access?

On a cluster with the NVidia GPU Operator, I can do this by setting the NVIDIA_VISIBLE_DEVICES environment variable, or by creating a hostPath mount to /var/run/nvidia-container-devices/all. So I'm looking for the GKE/ContainerOS equivalent of that.

garisingh

Have you tried installing the NVIDIA Operator on GKE?