Help Needed: Resource Availability Issue with A100 Notebook Instances on Vertex AI

Hello Google Cloud Community,

I'm Zain from Typicl.ai, and I am reaching out to seek assistance regarding a recurring issue we've encountered while trying to deploy A100 notebook instances on Vertex AI. Despite multiple attempts across various zones, we continuously face resource availability issues.

Issue Details:

  • Affected Zone: us-central1-c
  • Error Message: "The zone 'us-central1-c' does not have enough resources available to fulfill the request. (resource type: compute). Something went wrong. Sorry about that."

We have also tried deploying these instances in other zones with the same unfortunate outcome. This issue is critical as it significantly impacts our operational capabilities and we are under a tight schedule.

Questions for the Community:

  1. Resource Insights: Has anyone else experienced similar issues with A100 resources? If so, how were you able to resolve or work around the problem?
  2. Alternative Solutions: Are there specific zones or configurations that might have better resource availability we should consider?
  3. Priority Support: Does anyone have suggestions on how to engage Google Cloud support more effectively for urgent issues, or how to possibly escalate this?

Any advice, insights, or suggestions would be greatly appreciated as we strive to resolve this issue swiftly. Thank you in advance for your help and looking forward to some possible solutions from this knowledgeable community!

Best regards,

Zain
Typicl.ai

0 2 79
2 REPLIES 2

I have same issue at zone asia-northeast1-c and asia-northeast1-a. So wherever the A100 instance is created this issue still persists. I didnt find any solution to it. Did you find solution?. If so could you share with me? This issue is really annoying. Google developers should take care of it.

 

Hi @abdulzain6

Thank you for joining our community.

I understand how frustrating it can be when Vertex AI isn't working as expected. I've checked our service health and incident reports, and everything seems to be running smoothly. However, I did find an older open ticket in Vertex AI's known issues that sounds similar to what you're experiencing.

Consider reaching out to your project administrator as there might be organizational policies that limit your  resources prior escalating this to Google Cloud support.

Here are other references that can help.

I hope I was able to provide you with useful insights.