I have deployed a service on Google Cloud's App Engine, and I'm currently stress-testing the system by sending 500+ concurrent requests to the endpoint. However, I've encountered an issue where a significant portion of the requests (approximately 300+) are resulting in a '500: Request was aborted after waiting too long' response. I'm investigating the root cause of this problem.
I suspect that the App Engine might be struggling to handle this level of concurrent requests. If this is indeed the case, I'm seeking guidance on optimizing my app.yaml configuration to enhance App Engine's performance while keeping costs in check. Alternatively, I'm open to exploring other solutions that can effectively address this issue without necessitating changes to the App Engine configuration.
Here's a snippet of my current app.yaml configuration for reference:"
I am really confused on how to solve this efficiently, Any sort of help would be appreciated.
Thanks
Hello @shawavisek35!
Welcome to the Google Cloud Community!
In Autoscaler settings, you are getting this error because the request timed out on the pending queue waiting for an idle instance.
Check out Scaling Elements. min_idle_instances is the number of instances that are kept running and will always be ready to serve traffic. You can set your min_idle_instances to low or high minimum:
Also take a look at these Stack Overflow posts as they might provide an insight with your issue:
If the above options don't work, you can contact Google Cloud Support to further look into your case. Let me know if it helped, thanks!