i am running a debian on compute enginge for over 3 years.
vpnserver + website (base on wordpress + database). i had 2 time windows when the instance was constantly freezing.
1 year ago - i discovered that shuttting down apache2+ sql helps to keep the isntanace live
NOW - it keeps freezing.
That the freez means for me:
1/ On google cloude the instanace is up and running
2/ Services on the instance are not available:
- ssh not responding
- website not responding
- vpn server down not responding
- i can ssh to the machine even from google cloud console.
logs on https://console.cloud.google.com/logs are dead just after the freez. Last entry in the logs wasDEFAULT 2025-02-26T08:39:02.429336545Z [resource.labels.instanceId: debian] 2025-02-26T09:39:02.428739+01:00 debian systemd[1]: Starting phpsessionclean.service - Clean php session files...
DEFAULT 2025-02-26T08:39:05.610585894Z [resource.labels.instanceId: debian] 2025-02-26T09:39:05.610197+01:00 debian systemd[1]: phpsessionclean.service: Deactivated successfully.
DEFAULT 2025-02-26T08:39:05.614568969Z [resource.labels.instanceId: debian] 2025-02-26T09:39:05.614427+01:00 debian systemd[1]: Finished phpsessionclean.service - Clean php session files.
DEFAULT 2025-02-26T08:39:37.944231449Z [resource.labels.instanceId: debian] 2025-02-26T09:39:37.943538+01:00 debian systemd[1]: Starting gce-workload-cert-refresh.service - GCE Workload Certificate refresh...
DEFAULT 2025-02-26T08:39:38.318313855Z [resource.labels.instanceId: debian] 2025-02-26T09:39:38.317641+01:00 debian gce_workload_cert_refresh[2734]: 2025/02/26 09:39:38: Done
DEFAULT 2025-02-26T08:39:38.377331425Z [resource.labels.instanceId: debian] 2025-02-26T09:39:38.376944+01:00 debian systemd[1]: gce-workload-cert-refresh.service: Deactivated successfully.
DEFAULT 2025-02-26T08:39:38.377335732Z [resource.labels.instanceId: debian] 2025-02-26T09:39:38.377010+01:00 debian systemd[1]: Finished gce-workload-cert-refresh.service - GCE Workload Certificate refresh.
i can stop/start the instance and it will be again responding. Not sure where to find the root case of this issue, any help will be appricieted
The freez is in line which the memory peak and now after the peak it is stable and working without restart
Hi @JacekKac ,
Welcome to Google Cloud Community!
Based on what you are saying it seems that the system gets out of memory to actually continue to be operational. While technically is up on GCP as infrastructure your OS can't allocate any available space for your services to run up correctly, use their threads and store new entries.
The easy fix would be to upgrade to a different shape of vm with higher memory, but that would generate a cost for something that won't happen every time and won't justify it. May you consider implementing an OOM manager like OOM-Killer? There are several options for it, but this is one of the most recommended ones, this should keep your system under a manageable free swap and keep operations
Hi @JacekKac,
Welcome to Google Cloud Community!
This means the instance isn't completely frozen. The OS is still somewhat responsive and processes SSH requests from the GCP console. It's more likely that something is heavily overloading the system or causing a critical service to become unresponsive, but not crashing the entire OS.
Since you are able to SSH to the machine via google cloud console, here are some troubleshooting steps you can take.
1. Check Memory Usage
Use the free -m command to check memory and swap usage.
free -m
This command shows the total, used, free, shared, buff/cache, and available memory in megabytes. Also check the swap usage. If swap is heavily used, it indicates that the system is running out of memory.
2. Check Disk Space
df -h
If the root partition (/) is full or near full, it can cause system issues.
3. Check Disk I/O
Use the iotop command to monitor disk I/O usage. If it's not installed you can install it with:
sudo apt update
sudo apt install iotop
Then run
sudo iotop
4. Restart Services
Restarting the affected services might temporarily resolve the issue.
sudo service apache2 reload
sudo service apache2 restart
You may consider using reservations to guarantee capacity for your Compute Engine VMs. This helps ensure you'll have resources available even when demand increases.
Troubleshooting Documents
If you need further assistance and any questions, please reach out to our Google Cloud Support team.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.