We want to deploy an application in Google Cloud that processes messages from Google Pub/Sub. We like Cloud Run and would like to use it to deploy our application. Instead of a push model where Pub/Sub pushes messages to our application, we want to have a pull model where the application will pull messages. This way there will be no danger of overwhelming the application with messages. This means we will need to deploy a Cloud Run job instead of a service. The docs say
By default, each task runs for a maximum of 10 minutes: you can change this to a shorter time or a longer time up to 1 hour, by changing the task timeout setting as described in this page
How can we have a job that runs indefinitely?
Solved! Go to Solution.
Hi,
I would actually recommend using Cloud Run services for this. If you set CPU allocation to "CPU always allocated", as described here: https://cloud.google.com/run/docs/configuring/cpu-allocation, and set min-instances and max-instances to the same number, you effectively get that many instances always running; these instances can be pulling data from PubSub.
Because Cloud Run services scale based on requests, using them for pull-based workloads doesn't interact well with our autoscaling at the moment. You can experiment with sending inbound requests to get the service to scale up/down, or just stick with a fixed number of instances.
If you do go in this direction, I'd love to hear how it goes for you!
The reason I don't recommend jobs for this is that the fundamental difference between jobs and services is that jobs eventually exit; we don't intend to support indefinite jobs.
Hi,
I would actually recommend using Cloud Run services for this. If you set CPU allocation to "CPU always allocated", as described here: https://cloud.google.com/run/docs/configuring/cpu-allocation, and set min-instances and max-instances to the same number, you effectively get that many instances always running; these instances can be pulling data from PubSub.
Because Cloud Run services scale based on requests, using them for pull-based workloads doesn't interact well with our autoscaling at the moment. You can experiment with sending inbound requests to get the service to scale up/down, or just stick with a fixed number of instances.
If you do go in this direction, I'd love to hear how it goes for you!
The reason I don't recommend jobs for this is that the fundamental difference between jobs and services is that jobs eventually exit; we don't intend to support indefinite jobs.
@knet Thanks for your answer. We are also doing a pull-based solution and are having issues with autoscaling, as you mention.
We could possibly switch to a PubSub push model, but we are wondering if autoscaling has been improved since your reply for these type of loads, or if we should write our own autoscaler (unfortunate).
Please advise.
You ask two programmers for a recommendation, you'll get three answers 🙂
In this story, we are talking about having compute (processing) which is using Pub/Sub pull (as opposed to push). For Pub/Sub Push, I think Cloud Run is a GREAT solution but if we want to use Pub/Sub pull, I'm afraid I'm going to have to say (opinion) that Cloud Run is a very poor solution. Let me tell you my thinking. The notion behind Cloud Run is specifically to auto-scale processing as a function of load/demand. Lets make up some numbers (purely examples). Let's say that a single Compute instance can concurrently process 10 messages at the same time. If a topic has 5 messages dumped on it, then Cloud Run would instantiate an instance and we would be good. If however, the arrival rate of new messages on the topic increased to 15, that would overload a single Compute instance, so Cloud Run would add a new instance and we are good. If the number of messages increased ... Cloud Run would continue to add enough instances to accommodate the load. If the number of messages arriving per second dropped, Cloud Run could now start terminating old instances. Since you are charged for the number of instances of Compute, your bill would then be directly proportional to your value (i.e. JUST enough instances to accommodate your load).
Great ... so far so good ... now here's the BUT ... Cloud Run's autoscaler is based on external requests for processing. Meaning that something OUTSIDE Cloud Run is saying "Hey ... process this for me". Usually this is a REST request. So Cloud Run tracks the number of concurrent requests that ARRIVE at Cloud Run and are completed. In our example, if Cloud Run is processing 10 at the moment and an 11th arrives before any of the previous have completed AND we say that a single instance of Cloud Run can only accommodate 10 at a time ... Cloud Run will scale up a new instance. All of this works BECAUSE there is an external event that is being PUSHED to Cloud Run.
If we want to use Pub/Sub pull ... from everything I see Cloud Run simply isn't going to be a good fit. With a pull request, there is no external event that is being used for scaling. You may argue that the arrival of a new message on the topic is the event ... but it is the "insides" of YOUR app that is determining whether and when to pull and not a request outside of Cloud Run.
So ... what WOULD I suggest if you want to go down a Pub/Sub pull route? I'd likely suggest creating a Compute Engine "Managed Instance Group" (MIG). Just like Cloud Run, you can create a Docker image and associate that with a Compute Engine of your size preference. However, instead of creating just ONE compute engine, you create a template that describes the "nature" of the Compute Engine and then ask Google to create as many Compute Engines as needed to process your load. How does Google know how many Compute Engine instances to create? ... the answer is based on "Managed Instance Group Auto-scaling". Unlike Cloud Run which is triggered by the arrival of an incoming request, the Managed Instance Group autoscaling can be governed by a wide variety of factors. The one that draws my eye is the concept of "Monitoring Metrics". This says that the number of compute engines being created (scaled in or scaled out) becomes a function of some metric being tracked by Google Cloud. What could an example of such a metric be ... and now comes the aha moment ... the metric we could use might be the number of un-acknowledged messages in the topic. Ooooh!!! With some thought, we could potentially come up with a scheme where the number of Compute Engine instances becomes a function of the number of messages in the topic .... (with an upper bound of course) ...
As a loose example:
Compute engine count = number of messages in topic / 10 if <= 5 otherwise 5
This is the hybrid point ... not as "its just there" you get with Cloud Run but not as in "I have to do everything myself" if you write an autoscaler.
@kolban thanks very much indeed for your comprehensive reply.
Reading through tons of documents, it is not easy to figure out that platforms like Google App Engine or Google Cloud Run (services) have been designed to serve mostly web-applications, and only work well, as you've clearly explained, when external requests keep coming; otherwise, they simply (after a few mins) go to sleep (even though there is lots of work to do! the resources are busy doing some work, but all of sudden all db connections get disconnected, ...!!) But our legacy applications, which don't expect any requests from outside, need to keep running indefinitely (or for as long as we want them to).
In one occasion, and due to time constraints, we had to implement changes to our legacy app to make it a web-app!, the core remained the same, but added a few simple endpoints as entry points to call certain tasks like initialising, shutting down, starting concurrent threads to carry out some work, etc. we also had a 'keep_alive' endpoint that we used to send regular http requests (all using Cloud Scheduler Jobs) to so to keep the app up and running (deny it sleep)!
Clearly not ideal, but we did all the above to manage the project and take advantage of these fully Managed services; however, we are now convinced that the GCE is the right platform for our next legacy application even if we will have to manage almost everything on the VMs and maintain them fully.
That said though, we have had some use cases (still non-web app with no external requests) where Google Cloud Run Jobs (not Services) have been beneficial since they are allowed to run for much longer periods and to completion (max 168 hours for each task - *values more than 24 hours are in preview).
Hope this helps.
@kolban Thanks for your comprehensive reply (and the joke)!
Another factor I failed to mention is that we are using the "Streaming Model" Client, which I believe tries to settle in to a constant load from all appearances, which would defeat autoscaling, which looks at CPU load.
Therefore, we are going to experiment with the push model, for which, from what I can tell, works as advertised (beautifully). The original decision to go pull was to not have to communicate the CR instance URL to the upstream PubSub instance, but I think we have that solved.
Please confirm that will work.
Thank you!
Howdy ... can you elaborate on the "Streaming Model" client. I am not familiar with that concept. If you have a link where we can read more, that would be great.
I do have a similar scenario where I am trying to figure out where to migrate our Spring Boot app which is running currently on Pivotal cloud foundry. This app decodes the kafka messages and puts it back on a outgoing Kafka topic. The volume of messages received in a day is close to 800M. Should I consider using Cloud Run for this scenario?
Without knowing anything more than you have the need to receive messages from Kafka, transform the content and put the output back on Kafka ... that screams to me a transformation pipeline. Ideally you want it to be horizontally scalable to consume virtually nothing when traffic is light but scale up to as much resource as needed should the volume increase. Given those requirements ... I'd say use Dataflow to create an Apache Beam pipeline.
I admit I haven't read the whole thread, but I wanted to answer this question: "we are wondering if autoscaling has been improved since your reply for these type of loads". There has been one major difference since then:
Cloud Run now uses CPU utilization to autoscale services that aren't receiving any requests. (When they are receiving requests, it uses request count as well as CPU utilization.)
This means that if your processing logic is CPU-bound (as in, CPU utilization crosses ~70% when processing, and dips down when not processing messages), Cloud Run services will work for this use case in "CPU always allocated" mode (be sure to set min instances to >0, it won't scale back up from 0).
We're still working on further improving the experience here so that it also works for non-CPU-bound workloads.