Hi all,
I’m building a system on Google Cloud Platform and would love architectural input from someone experienced in designing high-concurrency, low-latency pipelines with Cloud Run + task queues.
I have an API running on Cloud Run (Service) that receives user requests and generates tasks.
Each task takes 1–2 minutes on average, sometimes up to 30 minutes.
My goal is that when 100–200 tasks are submitted at once, they are picked up and processed almost instantly (within ~10 seconds delay at most).
In other words: high parallelism with minimal latency and operational simplicity.
Tasks are published to a Pub/Sub topic with a push subscription to a Cloud Run Service.
Problem: Push delivery doesn’t scale up fast enough. It uses a slow-start algorithm that gradually increases load.
Another issue: Cloud Run Service in push mode is limited to 10 min processing (ack deadline), but I need up to 30 mins.
Bottom line: latency is too high and burst handling is weak.
I created a dispatcher that pulls messages from Pub/Sub and dispatches them to Cloud Run Services (via HTTP).
Added counters and concurrency management (semaphores, thread pools).
Problem: Complex to manage state/concurrency across tasks, plus Cloud Run Services still don’t scale fast enough for a true burst.
Switched dispatcher to launch Cloud Run Jobs instead of Services.
Result: even more latency (~2 minutes cold start per task) and way more complexity to orchestrate.
3. Cloud Tasks → Cloud Run Service
Used Cloud Tasks with aggressive settings (max_dispatches_per_second, max_concurrent_dispatches, etc.).
Despite tweaking all limits, Cloud Tasks dispatches very slowly in practice.
Again, Cloud Run doesn’t burst fast enough to handle 100+ requests in parallel without serious delay.
A simple, scalable design that allows:
Accepting user requests via API
Enqueuing tasks quickly
Processing tasks at scale (100–500 concurrent) with minimal latency (few seconds)
Keeping task duration support up to 30 minutes
Ideally using Cloud Run, Pub/Sub, or Cloud Tasks, but I’m open to creative use of GKE, Workflows, Eventarc, or even hybrid models if needed — as long as the complexity is kept low.
❓Questions:
Has anyone built something similar with Cloud Run and succeeded with near real-time scaling?
Is Cloud Run Job ever a viable option for 100+ concurrent executions with fast startup?
Should I abandon Cloud Run for something else if low latency at high scale is essential?
Any creative use of GKE Autopilot, Workflows, or Batch that can act as “burstable” workers?
Would appreciate any architectural suggestions, war stories, or even referrals to someone who’s built something similar.
Thanks so much 🙏
That's an interesting design problem.
My first thought is: Why is it critical to start the task within 10s if the actual task runs for 2 to 30 minutes? If there's some initial action that needs to happen quickly when the task first starts up, I wonder if you can do that action in the dispatcher service, to give yourself a bit more flexibility on the task startup time.
Looking through your options:
- I agree Cloud Tasks is not a fit here; that product is mostly designed to smooth out spikes, you need the opposite.
- Thank you for the feedback on PubSub's slow startup.
- I would have thought Cloud Run jobs would be the natural answer here. We intend for the first task of each job to start in ~20s; in some regions, it takes a bit longer. I'm surprised you're seeing 2min startup times.
- Have you tried using a Cloud Run service that you call directly, without using a queue in the middle? Cloud Run services now have a request timeout of up to 60 minutes, so this should work, starts fast and is a simple architecture. This is what I'd recommend trying next.
- Another idea is to switch from PubSub Push to PubSub Pull and process the messages with a Cloud Run services with min instances > 0 and instance-based billing (or with a Cloud Run Worker Pool once those launch - this was announced at Next).