Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Cloud Run 502 Bad Gateways Starting Jan 2025

Sometime between Jan 17th and Jan 24th 2025, all of our Cloud Run services across multiple GCP projects within my company started sporadically returning 502 Bad Gateway errors. I've been investigating this for a while and it's very hard to debug because it seems to be an internal change on Google's side that has caused this. 

Here's what I've found so far:

1. It is related to HTTP/2. Cloud Run has HTTP/2 enabled with the `--use-http2` flag when deployed. We are using golang's `h2c` package in the container.

We are also using a GCP load balancer in front of Cloud Run, and we set the backend protocol to `H2C`, though it doesn't seem to matter if it is `H2C`, `HTTP2` or `HTTPS` -- no matter which we choose, we still get sporadic 502s.

Turning off HTTP2 by removing the `--use-http2` flag makes the 502 errors go away

2. It seems correlated to request body size

Uploading a file has a very high chance of producing a 502 compared to a GET request with no body

3. HTTP requests originating from Cloud Build have an extremely high rate of failure

Whereas I might get a 502 once every 20 uploads from my local PC, nearly >50% of all uploads from a Cloud Build environment fail with 502. That's how we pinpointed that something changed between Jan 17th and Jan 24th 2025 - because that's when the very first 502 started happening for us:

radbryanpg_0-1744401225686.png

After that it's just been non-stop 502 failures. We've tried retry/backoff and lots of other things with limited success.

It doesn't even seem like the requests are making it into the golang whenever a 502 happens. Has anyone heard of this before? Is there an open issue for it somewhere? I'm not sure how to proceed debugging or working around this issue.


 

0 2 301
2 REPLIES 2

Hi @radbryanpg,

 Welcome to Google Cloud Community!

It is possible that this issue is related to the use of HTTP/2 in your Cloud Run services (deployed with the --use-http2 flag and your containers utilizing the h2c package). As a temporary workaround, I would recommend removing the --use-http2 flag when deploying the Cloud Run service since it already worked for you. Also, you may want to use HTTPS or HTTP backend protocol on load balancers instead of H2C.

I also see that a public issue tracker has already been created, you can check and make a follow up of your case so that the engineering team could investigate this further. Although, given the severity and considering how widespread this issue is across your projects and the clear signs that it may be from an internal infrastructure change and the fact that it seems tied to a specific timeframe, I would recommend opening a support case with Google Cloud Support. Be sure to include all relevant details you’ve uncovered — such as the timeframe, the link to HTTP/2 usage and request body size, and the significantly higher failure rate when requests come from Cloud Build.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Thanks, yes, the issue definitely appears related to HTTP/2. However, if I turn off HTTP/2, then I can't upload files bigger than 32MB without using signed URLs or chunked uploads. Signed URLs are problematic for customers who only want to whitelist my company's domain and not google's domain(s). I can't switch to chunked uploads easily because there is already medical device software in the field uploading things >32MB via HTTP2.

I gotta say though, this has been extremely frustrating. I open an issue with golang, they immediately close the issue and say it's a problem with GCP load balancers. I open an issue with GCP and they immediately close the ticket without any effort to reproduce saying it's an issue with my setup. I understand it's probably difficult for Google to keep up with the volume of public issues, but dang it's a very painful process to report a bug with GCP, and it's a privilege you have to pay for!