Solved: Re: insufficient nvidia.com/gpu with autopilot

jordi_m_p · 02-05-2023 01:24 PM

Hi,

I am trying to run a pod with gpu support but I am getting "insufficient nvidia.com/gpu". Can you help me understand what am I doing wrong?

This is the pod definition:

```

apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-t4
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "registry.k8s.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1

```

And this is the error I get if I run `kubectl describe pods`

```

Warning FailedScheduling 56s (x3 over 11m) gke.io/optimize-utilization-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory, 2 Insufficient nvidia.com/gpu, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.

```

Can someone give me a hand?

thanks

jordi_m_p

Hi,

yeah I saw the message about autoscaling. And this gave me the clue 🙂 ! . I was not understanding that , because of autoscaling feature on autopilot, I need a quota of 2 gpus, when I had a quota of 1.

I was able to fix it by requesting an increase of quota.

Thanks for your help

View solution in original post

garisingh

I believe it should fail at first since there are not GPU nodes deployed/available. After a little while, you should see a message about triggering autoscaling. Did this not happen?

jordi_m_p

Hi,

yeah I saw the message about autoscaling. And this gave me the clue 🙂 ! . I was not understanding that , because of autoscaling feature on autopilot, I need a quota of 2 gpus, when I had a quota of 1.

I was able to fix it by requesting an increase of quota.

Thanks for your help