Hi,
There is an example question posted:
You set up an autoscaling managed instance group to serve web traffic for an upcoming launch. After configuring the instance group as a backend service to an HTTP(S) load balancer, you notice that virtual machine (VM) instances are being terminated and re-launched every minute. The instances do not have a public IP address. You have verified that the appropriate web response is coming from each instance using the curl command. You want to ensure that the backend is configured correctly. What should you do?
A. Ensure that a firewall rule exists to allow source traffic on HTTP/HTTPS to reach the load balancer.
B. Assign a public IP to each instance, and configure a firewall rule to allow the load balancer to reach the instance public IP.
C. Ensure that a firewall rule exists to allow load balancer health checks to reach the instances in the instance group.
D. Create a tag on each instance with the name of the load balancer. Configure a firewall rule with the name of the load balancer as the source and the instance tag as the destination.
I also believe that the answer is C. I believe the LB health check cannot be performed due to the missing firewall rule. Specifically, LB health probes do not reach, and thus the VM exceeds its LB-defined
So the LB requests new ones to be created.
As for removal, the VMs that fail the LB health check do not get traffic. The MIG autoscaler observes that these VMs are running under capacity and removes them. The VMs themselves are healthy but idle.
Hi @kc3 ,
https://cloud.google.com/compute/docs/instance-groups/autohealing-instances-in-migs
"""
Tip: Use separate health checks for load balancing and for autohealing. Health checks for load balancing detect unresponsive VMs and direct traffic away from them. Health checks for autohealing detect unhealthy VMs and proactively recreate those VMs, so this health check should be more conservative than a load balancing health check. For more information, see What makes a good autohealing health check.
"""
The LB health check does not terminate instances. Only the autohealing health check.
> The MIG autoscaler observes that these VMs are running under capacity and removes them.
The MIG would not drop below the minimum configured capacity. If traffic were not reaching the instances, since firewall rules were incorrect, they would not be busy, and so they might be reduced to the configured minimum (which is typically not 0). There will be a few instances, and because traffic is not reaching them, they will not be busy. Therefore, the MIG does not need to expand the number of instances. They would sit in a steady-state at the minimum configured number.
Hi,
so we both believe that the LB is not responsible for terminating the VMs. Where we differ is that you believe the responsible is the MIG health check, whereas I think that it is the MIG autoscaler.
What do you make of this phrase in the problem statement?
You have verified that the appropriate web response is coming from each instance using the curl command.
To me, this suggests the VMs are healthy per se. So what remains as sole possibility to me is that they are removed because they are running under capacity.
Yes, to mitigate cold-start problems, you would most likely want some VMs on stand-by, but 1) the problem statement does not talk about that, and 2) as I understand it, the requests are piling up without the LB being able to forward any of them.
> What do you make of this phrase in the problem statement?
Good question, not sure. The health check is not literally the same as the web response. The health checks are configured separately. And therefore the health checks could be misconfigured.
> Where we differ is that you believe the responsible is the MIG health check, whereas I think that it is the MIG autoscaler.
If you believe it's the MIG autoscaler, then there is no disagreement. That must be the cause of terminating instances. However, that gets its information from a health check. From the autohealing health check. This is my point, that the question is wrong, because it should say "from the autohealing health check", or simply "health check". The "load balancer health check" would cause "the requests to pile up", but would not terminate instances.
(Or, the MIG autoscaler is terminating instances due to inactivity. But they would not scale up again. They would stay at a low number.)
According to my understanding, the MIG health check runs separately from the MIG autoscaler. Maybe it is useful to think of both as separate policy control objects that provide independent signals to the actual MIG resource manager. But the official documentation is vague about this point.
1) According to this https://cloud.google.com/compute/docs/autoscaler
"Autoscaling works independently from autohealing ."
2a) This here seems to address your point https://cloud.google.com/compute/docs/instance-groups/autohealing-instances-in-migs
"The health checks used to monitor MIGs are similar to the health checks used for load balancing, with some differences in behavior. Load balancing health checks help direct traffic away from non-responsive instances and toward healthy instances; these health checks do not cause Compute Engine to recreate instances. On the other hand, managed instance group health checks proactively signal to delete and recreate instances that become UNHEALTHY."
Noteworthy to me: "Compute Engine" recreates instances, and MIG health check "signals".
2b) Note that the MIG health check uses HTTP. Coming back to the problem statement, to me this suggests the VMs are healthy.
You have verified that the appropriate web response is coming from each instance using the curl command.
3) My conclusion: the VMs are not unhealthy, as per the problem statement. Only they do not respond to the LB health check, due to the missing firewall allow. The LB signals that it needs more resources (as the VMs do not respond to its health checks) whereas the MIG autoscaler signals that fewer resources suffice (as it sees the VMs not handling requests). These competing signals land in some queue that is processed by the MIG resource manager, leading to resources being added & removed.
You are right: I may have been mixing up MIG health checks and MIG autoscaler. They are different.
> The LB signals that it needs more resources
Is there evidence for this?
The MIG autoscaler will scale instances if the CPU usage is above the configured threshold. I was not aware that the MIG autoscaler would also add instances "because the LB signals that it needs more resources" regardless of CPU usage.
Hi, yes, have a look:
Scaling based on load balancing serving capacity | Compute Engine Documentation | Google Cloud
I guess this is what is put in place in the backend configuration of the LB.
Scaling based on "RATE". That could be it. I was mainly thinking about scaling based on cpu usage. Will read about it more.
User | Count |
---|---|
27 | |
15 | |
2 | |
2 | |
1 |