I've never found it easy to figure out when and why app engine adds and removes servers, but today I noticed something strange in particular which I would like to diagnose.
I am using basic scaling and gunicorn/flask with 2 threads and 1 worker. A k8s pod sends requests to these, scaling based on the backlog up to a maximum of 18 threads. The max instances is set to 25, and is not a limiting factor.
Despite being at the maximum 18 threads, and getting significant timeout errors in the form of 502, app engine scales DOWN from two B8 instances to one. I would expect instances in this case to increase steadily up to around 8-9. Instead, it keeps bouncing between 1 and 2 instances.
How can I diagnose what is causing this failure to scale? I have previously noticed app engine basic scaling being very hard to understand, but have not been able to find e.g. logs or diagnostic tools for this.