For a week or so now our company has experienced frequent "insufficient resources" errors like:
ERROR: (gcloud.beta.compute.instances.resume) The zone 'projects/***/zones/us-east1-b' does not have enough resources available to fulfill the request. '(resource type:compute)'.
Traceback (most recent call last)
Just to be clear these request are for "basic" VMs (no GPUs, nothing fancy). We have also not hit any quotes. I understand we could switch to a different zones nevertheless that would require extra work. Is there any way to have more visibility in the resources available in a given zone? Also the error is a bit cryptic, it's not clear which resource exactly is not available (CPU, memory, etc). Has something recently changed in us-east1-b? Any help would be appreciated.
This issue is well known and has been seen in this question (and a few others). We can see the answer explains “Its recommended to deploy and balance your workload across multiple zones or regions1 to reduce the likelihood of an outage, by building resilient and scalable architectures.”
And explained in the documentation: “Resource errors only apply to new resource requests in the zone and do not affect existing resources. Resource errors are not related to your Compute Engine quota and only apply to the resource you specified in your request at the time you sent the request, not to all resources in the zone.”
To solve it you can try a few options from either the forum: “If you want an immediate solution, create a snapshot 2, then create an instance from the snapshot with different zone or region 3.”
“Resolution:
Finally, try to follow the Best practices suggested, and though it is up to you, try to follow this step while creating your machine “Optional: Change the Zone for this VM. Compute Engine randomizes the list of zones within each region to encourage use across multiple zones.” and the guide on Tips for designing resilient systems and use of load balancer if you haven’t already to avoid this issue in the future.
I have tried to deploy an n1-standard-1 VM instance across all regions and zones within regions, and none of them are available. It is VERY time consuming and ultimately nothing seems to be available. I have tried this with VMs, as well as the different notebook options in the Vertex AI workbench. Is GCP closed for business if you want to use a GPU???
Same message across all of them: A n1-standard-1 VM instance is currently unavailable in the europe-west1-b zone. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation.
Is there a webpage or something that can actually tell me where resources are unavailable so that I can NOT waste hours checking so many regions and zones?