After an hour of training Auto ML with Vertex AI, it failed without mentioning the reason. I have received the following email;
"Due to an error, Vertex AI was unable to train model "some_model".
Additional Details:
Operation State: Failed with errors
Resource Name:
projects/xxxxxxxxxxxxxxx/locations/region/trainingPipelines/xxxxxxxxxxxxxxxxxxxxxxxx
Error Messages: Internal error occurred. Please retry in a few minutes. If
you still experience errors, contact Vertex AI."
Would you please help me with it?
Thanks
There was an issue with Europe West 2 Servers during that day, does your training model was in that region?
Is this still an issue or is it fixed now?
Not at that region and still the same error.
What could be happening is due to a permission error.
Fix custom training permission issues.
1. use default compute account of model preprocessing tenant projects to run training jobs
2. Grant default compute account storage.admin role to batch prediction/prediction/training tps during provisioning
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |