Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

App Engine Python 3 Standard entrypoint best practices

Hello,

I am in the process of migrating a legacy Python 2 App Engine application to Python 3, sticking to the standard environment for now.  I have several services on which I have been using in my app.yaml an entrypoint setting like:

 

entrypoint: gunicorn -b :$PORT -w 1 main:app

 

with one worker process. 

I forgot to add this on a couple services though and noticed that they were immediately being killed with:

While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.

Further investigation into the logs showed that it was actually starting up 8 worker processes for a single instance, with the default entrypoint being:

[serve] Running /bin/sh -c exec gunicorn main:app --workers 8 -c /config/gunicorn.py

This leads to two questions:

1. Why isn't Google following its own "best practices" documented at https://cloud.google.com/appengine/docs/standard/python3/runtime#application_startup for the default number of workers to start based on the instance class (which in this case is F2)?

2. What is in the default /config/gunicorn.py? None of the rest of the documentation for specifying an entrypoint says anything about this, so hopefully it's nothing important.  It was just curious to see this...

Solved Solved
1 5 2,206
1 ACCEPTED SOLUTION

1) I checked one of our Apps and for an F1 instance without entrypoint specified, Google runs gunicorn with 4 workers. Based on that, 8 would make sense for an F2 instance. But you're right that this seems to go against the documentation you've referenced.

2) It's possible that the documentation wasn't updated (they increased the default memory for the different classes because Python 3 required more memory and a larger footprint to run). You can click on the 'send feedback' button on the bottom of the page and tell Google about it.

3) Unless Google is using a custom gunicorn.py, the default for workers is 1 and the comment in the file i.e. gunicorn.py recommends a value of  ``2-4 x $(NUM_CORES)``. You also find the same recommendation in gunicorn documentation

 

..... NoCommandLine ......
 https://nocommandline.com
A GUI for Google App Engine
    & Datastore Emulator

 

View solution in original post

5 REPLIES 5

1) I checked one of our Apps and for an F1 instance without entrypoint specified, Google runs gunicorn with 4 workers. Based on that, 8 would make sense for an F2 instance. But you're right that this seems to go against the documentation you've referenced.

2) It's possible that the documentation wasn't updated (they increased the default memory for the different classes because Python 3 required more memory and a larger footprint to run). You can click on the 'send feedback' button on the bottom of the page and tell Google about it.

3) Unless Google is using a custom gunicorn.py, the default for workers is 1 and the comment in the file i.e. gunicorn.py recommends a value of  ``2-4 x $(NUM_CORES)``. You also find the same recommendation in gunicorn documentation

 

..... NoCommandLine ......
 https://nocommandline.com
A GUI for Google App Engine
    & Datastore Emulator

 

Thanks for the reply.  Unless someone from Google wants to chime in I think what you write seems logical; you're probably right that the documentation is just out of sync.

In my case 8 workers even for an F2 instance is far too many; but it's also a large application.  Obviously these are just guidelines and are going to depend entirely on the memory footprint of the app. 

I highly recommend to stick with one worker per instance, and use threads (2 or 4 x $(NUM_CORES)). Let app engine instances do the rest of the scaling.

I have posted an article with our migration experience wrt billing at https://gae123.com/article/gae-py3-billing

As I mention there we were able to lower our bill below python2 levels by using one worker and multiple threads with no performance degradation.

Hey, thanks for the reply, and I will check out your article.  For what it's worth, we ended up doing 1 worker per instance as you wrote, and are using the gevent worker.  Getting gevent working with the legacy app engine runtime is pretty non-trivial but we've done it successfully--I am still planning to write an article on how to achieve this.

Looking forward to the write up. I am sure that gevent would provide performance improvements, I saw a 20-30- improvement with our end-to-end tests. I think I mention that we hit runtime issues with grpc and did not investigate further.