Re: Recent frequent insufficient resources when re...

ravwojdyla · 06-13-2022 11:41 AM

For a week or so now our company has experienced frequent "insufficient resources" errors like:

ERROR: (gcloud.beta.compute.instances.resume) The zone 'projects/***/zones/us-east1-b' does not have enough resources available to fulfill the request. '(resource type:compute)'.
Traceback (most recent call last)

Just to be clear these request are for "basic" VMs (no GPUs, nothing fancy). We have also not hit any quotes. I understand we could switch to a different zones nevertheless that would require extra work. Is there any way to have more visibility in the resources available in a given zone? Also the error is a bit cryptic, it's not clear which resource exactly is not available (CPU, memory, etc). Has something recently changed in us-east1-b? Any help would be appreciated.

alexlorea

This issue is well known and has been seen in this question (and a few others). We can see the answer explains “Its recommended to deploy and balance your workload across multiple zones or regions1 to reduce the likelihood of an outage, by building resilient and scalable architectures.”

And explained in the documentation: “Resource errors only apply to new resource requests in the zone and do not affect existing resources. Resource errors are not related to your Compute Engine quota and only apply to the resource you specified in your request at the time you sent the request, not to all resources in the zone.”

To solve it you can try a few options from either the forum: “If you want an immediate solution, create a snapshot 2, then create an instance from the snapshot with different zone or region 3.”

“Resolution:

Try to create the resources in another zone in the region or in another region.
Because this situation is temporary and can change frequently based on fluctuating demand, try your request again later.
Try to reduce the number of resources you are requesting. For example, it's usually easier to get a VM with less GPUs, disks, vCPUs, and/or memory. Additionally, if your request is for multiple VMs, it's easier to get a smaller number of VMs. A reduction in the number of resources you are requesting might let your request proceed.
Try to change the type of resources you are requesting. For example, it might be easier to get VMs with older CPU platforms. A change to the type resource you are requesting might let your request proceed.
To prevent this error, create Compute Engine reservations when the resources you need are available to reserve them within a zone. Reservations help ensure that resources are available whenever you need them, so keeping reservations of resources you need can help prevent this error.
If you are trying to create Spot VMs (or legacy preemptible VMs), remember that these VMs are spare capacity, which is unplanned and volatile, so they might not be obtainable at peak demand periods. Consequently, Spot VMs are only recommended for workloads with flexible time, location, and VM-configuration requirements. You can help prevent this error for Spot VMs by following the best practices to make your workload more flexible. If this error persists, use standard VMs instead.
If you were unable to resolve the error using any of the preceding instructions, try Getting support.“

Finally, try to follow the Best practices suggested, and though it is up to you, try to follow this step while creating your machine “Optional: Change the Zone for this VM. Compute Engine randomizes the list of zones within each region to encourage use across multiple zones.” and the guide on Tips for designing resilient systems and use of load balancer if you haven’t already to avoid this issue in the future.

BourneShell

I have tried to deploy an n1-standard-1 VM instance across all regions and zones within regions, and none of them are available. It is VERY time consuming and ultimately nothing seems to be available. I have tried this with VMs, as well as the different notebook options in the Vertex AI workbench. Is GCP closed for business if you want to use a GPU???

Same message across all of them: A n1-standard-1 VM instance is currently unavailable in the europe-west1-b zone. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation.

Is there a webpage or something that can actually tell me where resources are unavailable so that I can NOT waste hours checking so many regions and zones?

Recent frequent insufficient resources when requesting "basic" VMs