Re: Help Diagnose App Engine Flex Abrupt Shutdown ...

jolo007 · 11-20-2024 11:48 PM

I encountered an issue where my App Engine Flexible Environment application shut down abruptly, and I’m trying to identify the cause. Below are the details:

Logs

DEFAULT 2024-11-21T06:37:10.753871679Z Triggering app shutdown handlers.
DEFAULT 2024-11-21T06:37:10.753871680Z Sending SIGTERM to app.
DEFAULT 2024-11-21T06:37:10.753871681Z 6cf0a00268baa65457243c5b20d4a3fa3629906a8cfa2b2987bcd1998d53ea70
DEFAULT 2024-11-21T06:37:10.753871682Z Sending SIGKILL to app.

App.yaml Configuration

service: X
env: flex
env_variables:
  TZ: "X" 
resources:
  cpu: 2
  memory_gb: 2
  disk_size_gb: 10
vpc_access_connector:
  name: 'X'
manual_scaling:
  instances: 1
network:
  session_affinity: true

Context

The app was running with a single instance (manual scaling set to 1).
Logs indicate the app received a SIGTERM, followed by a SIGKILL after triggering the app shutdown handlers.
There were no explicit errors or unusual resource spikes before the shutdown.

Questions

What could cause the SIGTERM and SIGKILL signals in App Engine Flex?
Are there specific logs or metrics I should review to pinpoint the root cause?
Could this be related to resource constraints (e.g., CPU, memory, disk)? If so, how do I verify this?
Is it possible that this was caused by a lifecycle event from GCP (e.g., VM migration, health check failure, or deployment scaling events)?

Any guidance or suggestions would be greatly appreciated! Let me know if additional context is needed.

Thanks in advance!

KyleMari

Hi @jolo007,

Welcome to the Google Cloud community!

SIGTERM and SIGKILL signals can be caused by App Engine when it undergoes certain planned or unplanned events. A few common scenarios would be (1) the instance undergoes regular updates, each of which is usually scheduled every week (which may affect the app if it stays for a long time), (2) the load demand on the app increases, causing allocated resources to go beyond what’s specified in the config or (3) the instance is being manually restarted or stopped. You can check our documentation that talks about how instances are being shut down.

When certain events happen like what is listed from the link to the documentation above, App Engine prompts the instance to send a SIGTERM (STOP) signal to the app, and then waits for around 3 seconds before it forces to kill the instance with a SIGKILL (KILL). It's possible to review the logs related to these signals using a few different ways available for App Engine.

Scenarios like this can also be linked to over-utilization of CPU, memory or disk. You can check the App Engine dashboard and analyze the usage for the memory, traffic, utilization, instances, and others to see if something peaks. If the actual graphs don’t signify that something went up, this may not be the reason.

Lastly, as for the lifecycle of an instance, it can also be a contributing factor. Also, given that your config is set to manual scaling, App Engine is not configured to scale up or down automatically based on demand so usually that 1 instance may either be restarted/shut down by a user, experience an overload in the resources allocated for the instance, undergo a system update, or may fail a health check routine causing the instance to restart. Even if it’s configured to scale up or down, your instance may undergo machine migration (in the event that the demand in workload increases or if the app goes idle) which can also cause an app shutdown using these signals.

I hope the above information is helpful.

Help Diagnose App Engine Flex Abrupt Shutdown Issue

Context

Questions