Cloud Run: Flask Apps suddenly take a LONG time to start

EDIT 3: The issue has been resolved. I'm rather disappointed by Google's lack of awareness of their own service degradation, and the time it took them to resolve the issue. As a paying customer I had to hand-hold them through investigating their own site reliability failure and received no acknowledgement of the time and money I spent debugging the issue that was affecting many customers using Cloud Run in us-central1. 

EDIT 2: Others have confirmed having the same issue in us-central1 

EDIT 1 : I have re-created the service in us-west1 region (previously in us-central1) and it seems to resolve the issue. But I can't keep the service here, my database is still in us-central1 so that would suck. I don't know what to do about this. (I had tried recreating the whole service fresh in us-central1 and it didn't help, so I am fairly certain that the region is the important factor here.) What can I do?? I can't log a support query without paying $29 😡

I have 2 web apps running in Google Cloud Run. One is a finished app and the other is in development. They are both small Python Flask apps running with GUnicorn. Since Saturday 6 April, they have been taking around 1 minute to start up.

App 1 has had no code or config changes, and up until now, it was starting up in about 3 seconds with 0 min instances and startup CPU enabled.

I have tried reducing the size of the docker images, and adjusting resources and threads, but nothing has helped. It seems that the delay keeps getting longer as well. Today it takes almost 2 minutes for the apps to start.

I have narrowed down the slowdown to Python imports. The container starts and Gunicorn starts in the normal time, but then while importing python modules, it stops for 10, 30, 40 seconds in between importing modules. Every time the pause is in a different place so I can't narrow it down to a particular Python module. Once all the modules have been imported, the app runs without any random pauses.

I was able to run these services with 0 min instances with a start up time of 3 seconds for the past 3 months. I had an active free trial, which expired on Saturday. I had upgraded my account to a paid version the day before, and I don't understand why this would be related but it is a coincidence. I made sure my development complete app was configured to run with low cost before the trial ended, and then as soon as the trial ended I am now forced to run with 1 min instance because of the 1-2 minute cold start up time. With 1 min instance, it still takes a long time for the initial start up but at least subsequent requests are served quickly.

I have tried testing one of the services with the smallest Flask App possible, one which only imports Flask and returns an html file to a GET request. This still takes 15 seconds to import Flask. When I run these services locally, or run the docker container locally, it takes less than 1 second.

The strangest part is that while testing fixes on both of these apps, I accidentally deployed the docker image of app 2 to app 1, and the startup was super fast. But then I tried deploying app 1 docker image to app 2 service and had no luck. I was unable to replicate this again.

I have tried creating a whole new service and deploying the image there - no luck. I also tried running the docker image locally with no internet to test if any dependencies are making network calls, they are not. The container starts up fine with no network.

I have done so much googling and I cannot fathom why a Flask app would randomly pause during imports and take over a minute to start up in Google Cloud Run. Please give me your suggestions, because I'm quite irritated that suddenly my free trial is over I have to start paying way more than I was expecting and if I want Google Support I have to pay extra. A lot of search results will tell me well Python is a slow language but 1 minute is not just Python being Python.

 

2 0 102
0 REPLIES 0