I have a long running app on appengine using py2.7. In the past I've used the smallest instance class, generally my app spins up about 10 instances to handle loads maxing out around 40requests a second.
Generally this cost me about $11 a day to run.
I've updated my code to py3, using cloud_ndb and a redis memcache. I noticed the python3 app consumes a lot more memory so I had to up the instance class to F2
On py2.7 my app size was about 107MB.. on py3 it's 475mb (the F2 Instance Class maxes at 768 MB):
Here's what my 10instances look like
The instances starts out already consuming a little over 600MB.. over time memory grows and I don't know exactly why until I bump into a memory limit, the instance is killed. I need to track that down but from my knowledge I'm not doing anything to hold onto memory.. so wondering if there's a memory leak somewhere that I'm not responsible for (the cloud ndb framework or something else)
Additionally these instances are only serving 4-5 requests a second... that seems really low and Id hope that I don't need that many instances to server 40requests a second... my hope was with a higher instance class and python3 threading a single instance could handle much more. Obviously it probably depends on how expensive each request is. So yes they cost more which probably accounts for my increased costs but they are also faster, more mem, and hoped that fewer instances would be needed.
I'm curious if because I'm so close to the mem limit to start if appengine is leaning on that metric to decide to spin up more instances then maybe it really needs? I'm bumbed that a forced update to py3 has made things so much more expensive. I'm an indy developer so making the change directly takes a few $1000 out of my pocket to run my server.
Now that I've done this py3 conversion and I no longer rely on the standard env I'm considering moving the server over to something like linode to bring costs down (maybe). Obviously that's a lot of work potentially and would love to avoid that if I can.
Any advice would be appreciated or if you need additional info
Thanks!
Yes, Python 3 Apps are larger than the Python 2 Apps and use more memory. Because of this, Google announced they had increased the memory allocated to all of the instance classes.
Make sure you have a .gcloudignore file in the root of your application directory. The file should include an entry for your virtual env folder. If you don't do this and you tested your app (ran it locally without using dev_appserver.py) before deploying, then a deploy would also deploy the virtual environment that was created when you ran your App and that increases the size of the deployed App.
If you used the datastore emulator or stored your test data in your Application directory, make sure to include that folder in the gcloudignore file else the contents will also get deployed increasing the size of your deployed App
Without seeing your code, one can only make guesses as to what is happening. Can you describe what your App does at a very high level ? That might help with tips on how to possibly reduce your memory consumption. Also, can you use Google's memcache instead of Redis memcache?
If you're not already doing this, look into writing/reading data to/from datastore in batches. That can help cut down on the cost. Also look at using ndb.tasklet
Thanks so much for the reply. I did not have a .gcloudignore and have added that. The size of the app went from 475mb down to 240mb, thanks for that. That will make deployments faster which is great but doesn't affect the amount of memory the app uses once running on an instance (I wouldn't expect it to).
My app is called Upwords:
https://itunes.apple.com/us/app/upwords-free/id588252565?mt=8
https://play.google.com/store/apps/details?id=com.lonelystarsoftware.upwords
All data is stored in datastore.. user account models, etc. With each request the rest api sends the user account info so the user record can be retrieved , the client can ask for a list of challenges for the user and send that back.. then when they open a challenge in the app that is sent, then the user plays a move and the challenge record is updated.
I use memcache a lot to store the challenge list and challenge models.. when a challenge is updated I delete the caches for that user and the opponent so the next time datastore is accessed to refresh the challenge list and cache. By far my most used end point is "/getChallenges" followed by. "/getChallenge" and "/updateChallenge"
At this moment in time "/getChallenges" sees about 275requests a min while "/getChallenge" sees 121requests a min and "/updateChallenge" seeing 100requests a min.
There are lots of other endpoints for chatting, updating your profile in the challenge, challenging other players etc.
I decided to try and get off the legacy services completely and that's why I spun up a Redis instance. At the moment I am off all those legacy appengine api's and did a test running my server on a Linode server which talks to datastore, taskqueue etc.. and it appears to work well. .. so this was a plan B if I can't get appengine cheap enough. I'd rather not forklift everything off if I don't have to.. At the moment it's costing 3x what my py2.7 version cost to run.
I did hook up the cloud ndb to Redis but I'm not sure if that's working.. I do it this way
def ndb_wsgi_middleware(wsgi_app):
def middleware(environ, start_response):
with ndbClient.context(global_cache=global_cache):
return wsgi_app(environ, start_response)
return middleware
So rather then doing the "with context" throughout the code I saw you can do it once here and all requests get that context.. so far it appears to work, I'm not getting any errors that I'm aware of. The global_cache is defined like so:
global_cache = ndb.RedisCache.from_environment()
I'm curious if this is less effective then the legacy memcache with ndb or if it works the same.. I could easily move back to the appengine memcache and if I move to Linode or some other external server use Redis then. Maybe I'll let it run today and see what resources it uses, then tomorrow switch back to the legacy memcache and see if it makes a difference.
thanks for taking a look!
Daniel
The reason I suggested Google's memcache was that I assumed it would be cheaper than a 3rd party product (I have no confirmation of this)
From your description, it looks like your large memory consumption triggers more instances and it already led you to using a more expensive instance class. Focus can thus be on how to reduce your memory consumption.
- Check your code to see if there are places where you don't have to read data into memory but access them when needed e.g. say you retrieve data or run a query, is it possible to use an iterator to access the data when needed instead of converting the entire thing into a list which is then held in memory?
- Can you use projection queries (which means you only return needed fields of an entity instead of returning the entire entity; this is advantageous if your entities are large)
- Check your logs to see if you can figure out which call or process is consuming a lot of memory. You can pick a log entry and view the trace details
Hi there.. thanks for sticking with this. I'm sure there are areas I can use an iterator. I guess I assumed if I load all the models into a list, iterate over etc.. then when the request is done wouldn't that memory get free'd anyway?
I'll look into the projected queries. My largest model has 55 properties, the data in each is pretty small.. so I don't know if that's considered large.
I "think" there's extra cost associated with talking to datastore.. more reads/writes.. so maybe in my conversion I'm leaning less into memcache but I don't think so. I'm wondering if in the legacy ndb / memcache more was automatically being cached then hooking redis to the cloud ndb. All my custom caching is exactly the same as before.. that's the only thing that's changed there. I'll have to do some tests rolling back to my old py2.7 server and seeing what the costs where.. I may also turn off all my caching and clear the redis cache and see if anything is actually being written there automatically, I haven't confirmed that yet.
Thanks Karolina, I'll take a look at all these options
Keep in mind that App Engine Flex requires that at least 1 instance IS ALWAYS up unlike App Engine Standard which when set to automatic scaling can go down to 0 when there's no instance which in turn saves money (you're not charged when an instance isn't running).
Thanks. Right and that's fine, there's alway someone playing the app.. so it's pretty much never idle.. at least 20requests a sec when it's lighter times of day
Hi dank, I also migrated several apps from py27 to py3. Unlike you, I'm still using bundled services, so I'm not using Redis for memcache nor a newer API to access Datastore. Keep that in mind when you read the following.
I can see the footprint of my apps (when deployed) is many times bigger now (e.g. from 2MB in Py27 to 80MB in Py3), but that didn't seem to affect much the memory the apps consume when running. In fact, my costs now are very similar to what they were before. I would suggest a few things:
1- Check the cost breakdown, so that you can pinpoint what's causing the expense. I don't see that in your question and it will look something like this:
2- Sometimes weird things happen (e.g. on Nov 19 I had a big spike in Frontend Instances in that App; I think it was because of a bot crawling my site, but I'm not 100% sure). If you're not sure you could send Google feedback right there attaching screenshots or try to get support. It's the question mark (?) on the top-right corner. Earlier this year, my expenses went up in several apps and I could see that it was because of Frontend Instances although I couldn't explaining it because traffic was the same and I hadn't changed anything to warrant it. I did the feedback thing (plus other thing I'll explain below) and a few days later the cost came back to normal and few weeks later I got a refund (credit to my billing account) that I suppose was related to that.
3- If you see that the cost is related to the number of instances you have running (as it seems to be the case), you can control that in your app's app.yaml file. Experiment with the scaling settings (instead of having the automatic default or zero for max_instances) and see what happens:
automatic_scaling:
min_idle_instances: automatic
max_idle_instances: automatic
min_pending_latency: automatic
max_pending_latency: automatic
max_instances: 0
I was able to control things using that when couldn't explain the increase in costs, but I'm not sure that's available in the environment you're using now.
Please, let us know how things go as we might have to start using those APIs at some point...all the best.
Thanks everyone for taking an interest with suggestions. I've figured out a few more things, including some errors on my end. A recap of what I've found
That's where I'm at, I appreciate all the pointers. I'll look at some of the other stratagies (projection queries, taskletts, using iterators rather then loading all the models into a list.. seeing where else I can memcache), after this test, want to keep it the same so I have apples to apples testing for the moment and just experiment with memcache.
Thanks again for the help! Glad the biggest issue was a mistake on my end.. I had the Redis env variables for using it generally but was surprised I needed an additional one for ndb
Well to wrap this up. After a lot of testing and optimizations I went from about $18 a day using the F2 class in py3 down to $10! That includes paying for a 1 gig Redis memcache. My app's cache hit rate is way up around 95% so I added caching and more efficient caching wherever I could. This cost is at or a little below what I was seeing in py2.7 with the lowest F1 instance class.
I did add a bool in my code to switch back to the legacy memcache to see how it performs that way.. that would likely bring the cost down to around $8 if it's as good as the 1gig Redis. But for some reason I'm seeing errors when I switch back to legacy api's. I have a top level variable "USE_LEGACY_MEMCACHE" and the app switches imports and function calls to legacy or Redis if it's off.
if USE_LEGACY_MEMCACHE:
from google.appengine.api import memcache
from google.appengine.ext import ndb
from google.appengine.api import wrap_wsgi_app
REDIS_ACTIVE = False
else:
from google.cloud import ndb
import redis
#Memcache
REDIS = redis.Redis(host=REDIS_HOST,port=REDIS_PORT)
#Data Store
ndbClient = ndb.Client()
global_cache = ndb.RedisCache.from_environment()
Then wrapped my get and set memcache to respect that variable to switch between the types. But I'm getting errors that aren't clear to me what's going on. It's erroring out when trying to .get() a key from the memcache. When I first set this up it worked great, they iterated on my app for a while and now I can't switch back to legacy. Going to set it down for the moment, happy with the progress so far and will look at this another time.
Thanks
D
When you use the legacy memcache, did you remember to wrap your WSGI object and also set app_engine_apis: true in your app.yaml file? See documentation here
I did remember to do both of those things. I haven't gotten round to debugging further though.. my app has been running at $9-10 a day with the Redis cost included which is a little cheaper than pre py3.. So for the moment it's not an issue. At some point I'll look again as I may be able to shave a couple dollars a day if I can get legacy working again. I do like that the app is more portable now and easier to test locally without the legacy apis
I have also incurred a similar issue
We had our application, on App Engine standard environment, GEN1 , developed on runtime, python 2.7 and webapp2 framework, deployed on google app engine.
Currently, we're in the migration process to update our framework from Webapp2 to flask and effectively, we're updating our python runtime as well, from python27 to python 310.
We've migrated the code base, however, we're facing some unexpected issues:
With webapp2, the google app engine build size raised upto 140 -160 MB, but when after migration to flask, python 310, the size rose upto, 480-500 MB..
Upon debugging, we found out, it's the same size that the project is engulfing locally on the machine.. i mean did webapp2 apply compression while deploying of app engine? And Flask isn't applying such compression?
Now, for all my F1 instance class declared, the build is deployed successfully, but my front end does not load, it's because my build size is about 480MB, greater than that of the F1 memory limit i.e 384 MB in the second gen runtime for the App engine standard environment.
We did few experiments to evaluate this build size rise i.e we checked adding requirements.txt in .gcloudignore that reduced size to 200 MB, and moreover, adding static folder, further reduced 136 MB. I mean we cannot add these in the .gcloudignore as these are required.
Also for example, if i go live with such build size, if i have 20 instances let's suppose, on an F2 instance, when will it spin another instance, using automatic scaling, and in return have an affect on billing.
A few things I learned.. I've actually optimized enough where now it's cheaper on py3 then py2.7.. but in part that may be the nature of my app
entrypoint: gunicorn -b :$PORT -w 2 main:app
This Dramatically reduced the memory required to run my app.. from like 700mb down to 220mb.. this allowed me to go back to F1 and save a lot of $. the -w 2 flag specifies how many workers.. I guess the more you have the more are in memory. I am not noticing any difference in the performance of my app since I added this. Again I'm sure it's all highly dependent on your app's needs.
As an additional optimization, you can use the `--preload` flag, which loads the app first before forking the work processes. This can save a bit of additional instance memory, but it depends on the app. For us this saves about ~40mb per instance:
entrypoint: gunicorn --bind=:$PORT --workers=2 --preload main:app
The migration from Python 2 to Python 3 and the changes in cost on App Engine might be influenced by various factors. It's essential to carefully examine your application, usage patterns, and the pricing model changes between Python 2 and Python 3 on App Engine.
Here are some aspects to consider:
App Engine Pricing Changes: App Engine pricing can be affected by various factors such as instance class, request/response sizes, and data storage. Google Cloud Platform occasionally updates its pricing models, and these changes might impact your overall costs.
Instance Class and Scaling: Different instance classes have different pricing. Review the instance classes you're using and consider if adjustments can be made based on your application's requirements. Additionally, examine how your application scales, as this can impact the number and type of instances running.
Resource Utilization: Python 3 might have different resource utilization patterns compared to Python 2. Assess whether your application's resource usage (CPU, memory) has changed significantly after the migration.
App Engine Flex vs. Standard: Depending on your application's requirements, you might be using either the standard environment or the flexible environment on App Engine. These environments have different pricing models, and your choice might impact costs.
Third-Party Libraries and Dependencies: Ensure that all your third-party libraries and dependencies are compatible with Python 3. Sometimes, the changes in libraries or the need to use alternative libraries in Python 3 can impact the performance or resource usage of your application.
Monitoring and Optimization: Regularly monitor your application's performance and resource usage using Google Cloud Monitoring or other tools. This can help identify areas for optimization and potential cost savings.
Google Cloud Credits and Billing Support: If you have Google Cloud credits, make sure to check your billing support options. Google Cloud offers billing support to help customers understand and manage their costs.
Consult Google Cloud Documentation and Support: Review the official Google Cloud documentation for the specific services you're using. Additionally, consider reaching out to Google Cloud Support for assistance. They can provide insights into your specific situation and offer guidance on optimizing costs.
Before making significant changes, it's crucial to thoroughly analyze the factors mentioned above and potentially consult with your development and operations teams to ensure that you are optimizing both the performance and cost aspects of your application on App Engine.
ChatGPT much?
User | Count |
---|---|
1 | |
1 | |
1 | |
1 | |
1 |