In GKE, I have created multiple clusters with 1 node each, with each node having 1 GPU. These clusters are in different regions. I have deployed my GPU application with kserve on each node (with knative providing scale to 0 capabilities for the nodes). I am using MultiClusterIngress for handling the traffic between the clusters.
In MultiClusterIngress, I use BackendConfig to specify the health check parameters of my application like the readiness endpoint and frequency. But, the issue is that this health check will always keep the node up even when no actual traffic is coming in. This implies that the GPUs will always be available (as they would not get autoscaled to 0) and hence, I would incur the cost of the GPUs, which defeats the purpose of scale to 0.
I need this health check to ensure that if a GPU is available in a node, only then send the client request to this cluster. Also, sometimes it has happened that GPU is unavailable in a region.
So how can I go about solving this issue of minimising GPU cost? Is there some way for me to cheaply do the health check of the GPUs? Or is there some different approach to solving my use case?
Which GPU(s) are you using?
And are you using multiple clusters to solve for GPU availability or are you just trying to route traffic based on geo?
I am using T4s and A100s. Both of these GPUs are not available in the regions which I use. Hence, have to use multiple regions. I am primarily using multiple clusters to solve GPU availability problem.
My main problem is that currently, I have to keep multiple GPUs which is wasteful (unless I am receiving large traffic). I want to find a way to readily spawn nodes (with GPUs) in clusters/regions where GPU is available instead of waiting for timeout to occur saying no GPU found.
Hello,
Thank you for the information. I’m not familiar with the GPU availability checker you mentioned, could you provide me with the URL or further details about it?
Additionally, just to confirm: does this mean that Google does not guarantee the retention of the GPU once it has been allocated during maintenance, even if it was assigned to my vms?
Thank you