Hi everyone,
I have Cloud Run with GPU access enabled in my Google Cloud project. Which new AI models fit within the Run resource constraints and are there any reasoning models that will work?
You can check here
This article is great. Thanks for sharing @dhirajpatra . I would add that now there are a set of Gemma3 models that will fit quite nicely in a single L4's 24GB of memory.
Kindly share with us what was your steps and how it is running @donmccasland
Thank you
Gemma 2B
You can check few of my repo to run on local as well as possible in cloud.
https://github.com/dhirajpatra/langgraph-multi-agent
https://github.com/dhirajpatra/private_kb_rag_graph_ollama