Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Is it possible to run AI models in serverless? how?

I have a local Text to Speech model that I would like to deploy on GCP. Having a nvidia GPU instance is very expensive, I'd like something on demand, the using time can be very random, and I use only for few minutes to generate sentences.

I tried using cloud run, for my big surprise it works, using 16GB and 4 cores, but still takes too much time to generate my audios.

So actually I'm looking for a GPU serverless solution, not sure it that exists, or any  hack how to use with vertex? 

Solved Solved
0 1 1,248
1 ACCEPTED SOLUTION

Hi @adamofig

Welcome back and thank you for reaching out to our community.

Running your use case in Cloud Run is one way to go about it but you need faster audio generation hence thinking of employing GPU compute power.

As you may already know, GPUs are not yet supported in Cloud Run. I also think that there is no available  serverless solution for your use case as of this time. You can consider using GKE and Cloud Run together for you to be able to create node pools equipped with GPUs.

Here are some resources you can reference to:

I hope I was able to provide you with useful insights.

View solution in original post

1 REPLY 1

Hi @adamofig

Welcome back and thank you for reaching out to our community.

Running your use case in Cloud Run is one way to go about it but you need faster audio generation hence thinking of employing GPU compute power.

As you may already know, GPUs are not yet supported in Cloud Run. I also think that there is no available  serverless solution for your use case as of this time. You can consider using GKE and Cloud Run together for you to be able to create node pools equipped with GPUs.

Here are some resources you can reference to:

I hope I was able to provide you with useful insights.