Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex AI Auto ML regression training pipeline error message: Internal error occurred

First time user of the Vertex AI Auto ML training pipeline. I've made multiple attempts at training a fairly simple dataset (less than 10K observations, no missing data) using tabular regression, and it's always immediately failing with the error "Training pipeline failed with error message: Internal error occurred. Please retry in a few minutes." I've used both MAE and RMSE. Details of one training session below. What am I doing wrong?

 

StatusFailed
Training pipeline ID5256146067350618112
CreatedJan 7, 2024, 2:44:19 PM
Start timeJan 7, 2024, 2:44:43 PM
Budget (original)1 node hours
Elapsed time15 sec
Regionus-west1
Encryption typeGoogle-managed
Datasetuntitled_1704575609483
Target columnPE
Data splitRandomly assigned (80/10/10)
Column metadataVIEW DETAILS
AlgorithmAutoML
ObjectiveTabular regression
Optimized forMAE
0 9 1,448
9 REPLIES 9

This is also happening to me. And i am so confused because the dataset is very simple.
The logs aren't very helpful either.

The most informative log i found was: "The DAG failed because some tasks failed. The failed tasks are: [tabular-stats-and-example-gen]."
Then the logs for tabular-stats-and-example-gen say 

error.code: 13
error.message: INTERNAL

have you been able to successfully train the model?

Future
New Member

i Have same propblem here, already wasted some hours to check dataset etc and it seems its just pipeline problem not mine 😞

I am getting the same in my recently created project, but I have an older project where everything just works, even with the same data.

No, I haven't trained it yet - that's the step where it's failing. I've read elsewhere that AutoML may have issues with Excel CSV files, which is what I uploaded. I'm going to try saving it as a Google CSV and try that.

Thanks for tip! I used Google sheets CSV, but I'll try today than to save
them with Python, I'll let you know if that helped

For me, it was fixed by granting the requested roles to the service account used when creating a pipeline run for training the model.

daviderinayo_2-1704800433745.pngdaviderinayo_1-1704800325349.png

daviderinayo_3-1704800645376.png

 

I've done granted these roles but still having the error

I am having the same issue here too