Looking for an easy way to deploy my Pyhton web server, it user some LLM model and I need a cuda to accelerate response time. So once it was so easy to deploy any service to the Google App Engine, I wonder if their flexable environment already has an option to add GPU support...
Solved! Go to Solution.
Hi @masterresultonl,
Welcome to Google Cloud Community!
To answer your question, App Engine Flexible Environment currently does not support GPU acceleration, so it's not suitable for hosting LLM-based services requiring CUDA. I would recommend using Compute Engine or Google Kubernetes Engine (GKE), both of which allow you to provision NVIDIA GPUs and install all necessary CUDA libraries and LLM dependencies.
You may also want to consider filing a product feature request. Note that I won’t be able to provide the date as to when this will be implemented.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi @masterresultonl,
Welcome to Google Cloud Community!
To answer your question, App Engine Flexible Environment currently does not support GPU acceleration, so it's not suitable for hosting LLM-based services requiring CUDA. I would recommend using Compute Engine or Google Kubernetes Engine (GKE), both of which allow you to provision NVIDIA GPUs and install all necessary CUDA libraries and LLM dependencies.
You may also want to consider filing a product feature request. Note that I won’t be able to provide the date as to when this will be implemented.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hey, thanks for an answer! In general I need to test how fast it would work with different GPU capabilities....so I'm looking for some fast and cheap test solution...probably will try a VM...although it's not sound so fast and easy for deployment like GAE.