Solved: Is it possible to run AI models in serverless? how...

adamofig · 03-10-2024 05:23 PM

I have a local Text to Speech model that I would like to deploy on GCP. Having a nvidia GPU instance is very expensive, I'd like something on demand, the using time can be very random, and I use only for few minutes to generate sentences.

I tried using cloud run, for my big surprise it works, using 16GB and 4 cores, but still takes too much time to generate my audios.

So actually I'm looking for a GPU serverless solution, not sure it that exists, or any hack how to use with vertex?

lsolatorio

Hi @adamofig,

Welcome back and thank you for reaching out to our community.

Running your use case in Cloud Run is one way to go about it but you need faster audio generation hence thinking of employing GPU compute power.

As you may already know, GPUs are not yet supported in Cloud Run. I also think that there is no available serverless solution for your use case as of this time. You can consider using GKE and Cloud Run together for you to be able to create node pools equipped with GPUs.

Here are some resources you can reference to:

I hope I was able to provide you with useful insights.

View solution in original post

lsolatorio