I have tried training a model which is a csv file with 133 columns and 4100 rows. It took more than 2.5 hours and then got failed with the message
Error Messages: INTERNAL
I tried three times, but same error. What would be the problem here?
Hi @mahalingamb,
Welcome and thank you for trusting the community for help.
It seems that you are having issues with your training job and Vertex AI doesn't provide you with ample information to troubleshoot or figure out what's causing the problem.
Let me share with you some insights on how to go about this concern.
1. I have checked on our Service Health dashboard and there are no recent reports about an error with any of Vertex AI services. I suggest you review your cloud logs for any Vertex AI related errors, you might find useful entries that can help you troubleshoot or even identify what's causing the issue.
2. The error might be a temporary system error that will resolve itself at a later time. Please continue doing trials and you can refer to this documentation for troubleshooting guidance. You can also utilize this Issue tracking system and product feature requests article in searching for answers.
3. I checked other relevant posts/ inquiries online and most of their concerns were resolved by checking and making sure that they have the right permission to run the job. You may need to revisit access rights for your cloud storage bucket, data sources, IAM service account and region.
You can also reach out to Vertex AI support for a more detailed discussion about this issue.
Hope this helps.
Hello
I expect there are more columns and rows.
Sometimes it may be due to permissions? Did you try and triage via the error logs?
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |