Re: VertexAI- Auto ML training model failed withou...

Marjanbg · 07-21-2022 12:13 AM

After an hour of training Auto ML with Vertex AI, it failed without mentioning the reason. I have received the following email;
"Due to an error, Vertex AI was unable to train model "some_model".
Additional Details:
Operation State: Failed with errors
Resource Name:
projects/xxxxxxxxxxxxxxx/locations/region/trainingPipelines/xxxxxxxxxxxxxxxxxxxxxxxx
Error Messages: Internal error occurred. Please retry in a few minutes. If
you still experience errors, contact Vertex AI."

Would you please help me with it?
Thanks

josegutierrez

There was an issue with Europe West 2 Servers during that day, does your training model was in that region?
Is this still an issue or is it fixed now?

Marjanbg

Not at that region and still the same error.

josegutierrez

What could be happening is due to a permission error.

Fix custom training permission issues.
1. use default compute account of model preprocessing tenant projects to run training jobs
2. Grant default compute account storage.admin role to batch prediction/prediction/training tps during provisioning

VertexAI- Auto ML training model failed without giving the reason