Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Cloud run: Connection Reset by peer

Kbb
Bronze 1
Bronze 1

We are running a service with the structure of
Client (Web, App) -> AWS API Gateway -> GCP Cloud Run (called by public URL).


The following error occurs when proxying a request from the API Gateway to the GCP Cloud Run, which is logged in the AWS log.

'Execution failed due to a network error communicating with endpoint: Connection reset by peer'

Not long after starting proxy the request (500ms), the message is logged and the API Gateway returns a Http 504 response to the Client App.

I don't think it's a timeout-related problem because it generated the error in 0.5 seconds. And this happens intensively at a very specific time.

As far as I know, the 'Connection reset by peer' error is related to the socket queue size of the server, what can I do to solve the problem in Cloud Run?

Or if the cause of the problem is different from what I thought, I would appreciate it if you could let me know the cause.

0 1 783
1 REPLY 1

Hi @Kbb ,

Welcome to Google Cloud Community!

"Connection reset by peer" in Cloud Run typically indicates that the server closed the connection unexpectedly due to reasons like application crashes, timeouts, or network issues. This error arises when an application has an active TCP connection to a peer on the network, and that peer unexpectedly terminates the connection.

Here are several methods that you can try that may help to address this issue:

  • If you're performing background tasks with CPU throttling, consider using the "CPU is always allocated" setting for CPU allocation.
  • Make sure your outbound requests stay within the timeout limits. If your application keeps a connection idle beyond these limits, the gateway will terminate the connection.
  • By default, the TCP socket keepalive option is disabled in Cloud Run. While you can't directly configure keepalive settings at the service level, you can enable it for each socket connection by specifying the appropriate socket options based on the client library you're using in your application.
  • Occasionally, outbound connections may reset due to infrastructure updates. If your application utilizes long-lived connections, it’s advisable to configure it to reconnect, preventing the reuse of terminated connections.
  • If you're using an HTTP proxy to manage egress traffic from your Cloud Run services or jobs, and the proxy imposes a maximum connection duration, it may silently drop long-running TCP connections established through connection pooling. This can lead to failures when HTTP clients attempt to reuse closed connections. If you plan to route egress traffic via an HTTP proxy, be sure to implement connection validation, retries, and exponential backoff. Additionally, set maximum values for connection age, idle connections, and idle connection timeouts in your connection pools.
  • You can use monitoring setup to track metrics like request rates, error rates, and latency. Using Google Cloud Monitoring can give you better visibility into traffic patterns and help identify when the errors spike.

I hope the above information is helpful.