Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

insufficient nvidia.com/gpu with autopilot

Hi,

I am trying to run a pod with gpu support but I am getting "insufficient nvidia.com/gpu". Can you help me understand what am I doing wrong?

This is the pod definition:

```

apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-t4
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "registry.k8s.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1

```

And this is the error I get if I run `kubectl describe pods`

 

```

Warning FailedScheduling 56s (x3 over 11m) gke.io/optimize-utilization-scheduler 0/2 nodes are available: 2 Insufficient cpu, 2 Insufficient memory, 2 Insufficient nvidia.com/gpu, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.

```

Can someone give me a hand?

 

thanks

 

Solved Solved
0 2 1,763
1 ACCEPTED SOLUTION

Hi,

yeah I saw the message about autoscaling. And this gave me the clue 🙂 ! . I was not understanding that , because of autoscaling feature on autopilot, I need a quota of 2 gpus, when I had a quota of 1.

I was able to fix it by requesting an increase of quota.

Thanks for your help

 

View solution in original post

2 REPLIES 2

I believe it should fail at first since there are not GPU nodes deployed/available.  After a little while, you should see a message about triggering autoscaling.  Did this not happen?

Hi,

yeah I saw the message about autoscaling. And this gave me the clue 🙂 ! . I was not understanding that , because of autoscaling feature on autopilot, I need a quota of 2 gpus, when I had a quota of 1.

I was able to fix it by requesting an increase of quota.

Thanks for your help

 

Top Labels in this Space