Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Help to triage out of memory issues in a Python bokeh Cloud Run deployment

In my Python / bokeh Cloud Run app , I hit an apparent memory leak issue that I cannot reproduce in my local environment. Any suggestion on what to look for next would be appreciated. Thanks!

Symptom:

Even in a single user usage, container memory utilization steadily grows until it runs out of memory. I tried to track down the memory usage of the app (using psutil ), but couldn't find any noticeable leak in my process. I could not reproduce the issue in my local environment either.

Details:

  • Container memory utilization (per console metric) steadily grew (over 500M increase) till it ran out of memory.
  • I tried to track the app's memory usage (with psutil) and it was largely stable during the same timespan: RSS / USS 370 - 470 M, VMS 2100 - 2245 Mb.
    • Other resource usage was largely stable: num. of threads, and num. of file descriptors.
  • When Out-of-memory event was detected , the RSS usage was not at its peak (~405M), the VMS usage was at its peak (~2245 M).
  • The app is deployed based on a standard Python Docker image ( python:3.11-slim ).
  • I could not reproduce the issue locally either. I am not setup to run Docker locally, and used an approximation by running the app in Windows WSL.
 

 

0 1 28
1 REPLY 1

Update: I've found the root cause. It is because my app downloads data and caches them in local file system, which is just memory in Cloud Run. The memory usage increase tracks the accumulated download over time. 

 

Top Solution Authors