Cloud Run routes traffic to container before HTTP ...

ContinueAsGuest · 05-07-2024 02:35 PM

Hi,

We have a JavaScript application in Cloud Run, with HTTP startup/readiness checks enabled. We're seeing Cloud Run routes traffic to containers before the startup probe succeeds.

Looking at an individual container, the first request comes in at 18:32:01 and eventually times out. According to the logs, the startup health check succeeds 9 seconds after, at 18:32:00.

The Cloud Run Health Check documentation says:
"Startup probes determine whether the container has started and is ready to accept traffic."
So we're wondering why we see traffic coming into containers before the startup check succeeds.

Here are the two log entries showing the sequence (I've removed sensitive information from the logs, so they're missing some fields):

[
    {
    "textPayload": "The request has been terminated because it has reached the maximum request timeout. To change this limit, see https://cloud.google.com/run/docs/configuring/request-timeout",

    "httpRequest": {
      "requestMethod": "GET",
      "requestUrl": "https://foo/bar",
      "status": 504,
      "latency": "9.998326453s",
      "protocol": "HTTP/1.1"
    },
    "resource": {
      "type": "cloud_run_revision",
      "labels": {
        "service_name": "webapp",
        "configuration_name": "webapp",
        "revision_name": "webapp-123456"
      }
    },
    "timestamp": "2024-05-07T18:31:51.166461Z",
    "severity": "ERROR",
    "labels": {
      "instanceId": "00f46b92853c27bd9df5f9f8a8a080371a7f6a79326a8420449ef4a2cdda8b9e2b97740a7ce77f4f0ceb9e550476358c91668b4b67d6ee232ea6b109b519d8c3"
    },

    "receiveTimestamp": "2024-05-07T18:32:01.377377472Z",
    "spanId": "13942351057703038538",
    "traceSampled": true
  },

  {
    "textPayload": "STARTUP HTTP probe succeeded after 1 attempt for container \"webapp-1\" on path \"/api/startup\".",

    "resource": {
      "type": "cloud_run_revision",
      "labels": {
        "service_name": "webapp",
        "configuration_name": "webapp",
        "revision_name": "webapp-123456"
      }
    },
    "timestamp": "2024-05-07T18:32:00.096356Z",
    "severity": "INFO",
    "labels": {
      "instanceId": "00f46b92853c27bd9df5f9f8a8a080371a7f6a79326a8420449ef4a2cdda8b9e2b97740a7ce77f4f0ceb9e550476358c91668b4b67d6ee232ea6b109b519d8c3"
    },
    "receiveTimestamp": "2024-05-07T18:32:00.104944032Z"
  }
]

pholden

I'm also seeing the same behaviour here.

Rough timeline for one example:

2024-06-02 08:48:01.755 - cloud run sends request to the instance (this is the first log line). httpRequest.latency is 30.876 s
2024-06-02 08:48:29.648 - uvicorn/fastapi logs that start is complete
2024-06-02 08:48:32.631 - request from cloud run completes (based on timestamps +httpRequest.latency)
2024-06-02 08:48:33.444 - STARTUP HTTP probe succeeded
2024-06-02 08:48:33.446 - LIVENESS HTTP probe succeeded

kata_dev

I'm facing same issue.
The Known issues in Cloud Run says:
"A request sent to the service endpoint might be used to start a Cloud Run instance, and that request can be assigned to the instance before the startup probe results are known. If the probe passes, then the request will begin to be processed by that instance at the receiveTimestamp listed in the Cloud Run request log. If the probe fails, then failure will be logged without ever entering the service's code."

So it may be a Cloud Run's bug.
Does anyone know how to avoid it??

rames_jusso

We are also noticing this with our cloud run service, would be interested to hear if anyone has solved for this

ContinueAsGuest

Another thing we noticed: a container that's starting up can be assigned multiple requests before the startup probe succeeds.

We asked Google Cloud support for help with this problem. They got back to us (this was back in May, sorry for my late update here):
Our investigation identified the 504 logs error with the message “The request has been terminated because it has reached the maximum request timeout.” are requests in hold until the startup request probe succeeds. The request was not sent to the container.

This is just my interpretation so far: requests are assigned to containers regardless of if they're ready or not. The startup probe is used to buffer requests that have been assigned to a container that's still starting up. It is not used to decide which containers to assign requests to. This is unintuitive to me because the last cloud platform I worked with (and every load balancer I've used) uses health checks to decide where to route requests. I can't imagine why you wouldn't it to work this way.

Google's general advice is to write and tune your applications to startup quickly. So I'm thinking this behaviour is by design. We asked Google why it works this way, hoping to understand how to most effectively use this behaviour and they haven't responded.

ContinueAsGuest

One thing you might be able to do to mitigate the impact is increase your max request duration for the Cloud Run service to be long enough to allow: the container to startup, then the startup probe to succeed, then the request to be handled by your app.

rames_jusso

Thanks for the response @ContinueAsGuest for the additional info! That seems not ideal as to your point other cloud platforms generally use the startup probe per instance to determine when to route traffic to it

The only other thing that comes to mind is potentially having more minimum instances around so that it doesn't need to ever autoscale and automatically add instances. This assumes though that the HTTP startup probe is taken into account for new revision deployments

Cloud Run routes traffic to container before HTTP startup probe succeeds