Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

MIG does not scale based on CPU utilization

In the past few days, we have our MIG stop scaling, causing instance's CPU to overload and underutilized when it's in low peak

The issue only occurs when `Instances as predicted` goes to 0

We are having the same issue in both our staging and production environment

Is there any ongoing issue from GCP??

Production:

KhooHaoYit_0-1745919167577.png

Staging:

KhooHaoYit_0-1745921143832.png

 

0 2 273
2 REPLIES 2

Hi @KhooHaoYit,

Welcome to Google Cloud Community!

Here are some basic troubleshooting steps you can follow:

  1. Adjust Predictive Autoscaling Settings:

The quickest and simplest solution is to disable Predictive Autoscaling in the instance group settings for both staging and production environments via Google Cloud Console. When the predicted instance count is set to zero, the autoscaler may shut down all instances, failing to anticipate sudden traffic spikes and potentially overloading the remaining instances. To avoid this, review and adjust your predictive autoscaling settings to better align with your application's actual traffic patterns.

  1. Check the Network and Firewall Rules:

Ensure your firewall rules permit traffic to reach your instances and that health checks aren't being blocked, as misconfigurations can disrupt proper functionality. Additionally, verify that your instances are correctly set up to accept traffic from your load balancer or other ingress sources.

  1. Monitor and Adjust Initialization Period

Ensure that your initialization period accounts for your application's startup time, as an incorrect setting can trigger premature scaling. After disabling Predictive Autoscaling, monitor CPU utilization to confirm that CPU-based autoscaling is functioning as expected. If scaling isn’t responsive enough, consider adjusting parameters like target CPU utilization or cooldown period. You can also revisit Predictive Autoscaling later if needed, but ensure it’s properly configured and closely monitored.

The issue appears to stem from Predictive Autoscaling being configured to forecast zero instances, which may override standard CPU-based scaling and lead to the observed behavior. This is not due to any ongoing GCP outage, but rather a configuration issue within your control—especially since it affects both staging and production environments. While it's wise to check the Google Cloud Status Page, the root cause is almost certainly tied to your current autoscaling setup and its reliance on CPU utilization alone.

Additionally, consider consulting with our Google Cloud Support to help you get a clearer idea of how to adjust your autoscaling configurations and mitigate the scaling issues you're facing. 

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Hey @nmagcalengjr

Here's my response

  1. We have our predictive autoscaling off on all of our MIG
  2. We did not modify any network or firewall rules that would cause the autoscaler to stop working
  3. The MIG does not scale up/down regardless of high/low cpu utilization

We observed that this issue occurred on 2 different GCP project without any input from us, soo it must be something else failed and caused the MIG to stop scaling properly

We also have somewhat mitigated this issue by setting our minimum count to our peak instance count which isn't ideal for us

Here's the screenshot showing 29 out of 39 MIG stopped scaling in production

KhooHaoYit_0-1747044494488.png

Here's the screenshot showing the MIG instances count in production

KhooHaoYit_1-1747044544033.png

Here's the screenshot showing 32 out of 33 MIG stopped scaling in staging

KhooHaoYit_2-1747044587972.png

Here's the screenshot showing the MIG instances count in staging

KhooHaoYit_3-1747044602098.png

As you can see around mid day of April 24 to April 26, there's some sort of degradation throughout the day, and suddenly almost all of our MIG stopped working for nearly a day starting from April 26 to May 1

While the issue has resolved since May 1, it's quite concerning that this kind of issue would happen without us modifying our MIG