Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex AI Image - Training Failed (Internal Error)

Hi,

I want to train a multi-label image classification model in Vertex AI but after several attempts, it keeps returning the same error message: "Training pipeline failed with error message: Internal error occurred. Please retry in a few minutes".

I tried to train it in different locations (us-central1(Iowa) and europe-west4(Netherlands)) but I still get the same message.

Could you guide me on which could be the issue and how to solve it?

Thanks!

1 3 399
3 REPLIES 3

If possible can you try making another project for this and train? Otherwise it is recommended to contact support, since engineers has a better visibility of your projects resources and logs: https://cloud.google.com/contact

Thanks for your reply. I tried what you suggested and it returns the same error. 

@lauraperezc22 were you ever successful in resolving the issue? I am also running into this issue. I recently upgraded the version of tensorflow that I'm using in my training docker image, which I'm running on vertex. I'm wondering if the issue is related to gpu hardware issues, but I have no way of knowing since the internal error is so non-descriptive. Plus I don't see any issues when training directly on a virtual machine.