I wanted to test VertexAI AutoML for video object tracking, so I threw together 15 short videos with annotations for one category (the minimum).
The training run failed after 8h with this message:
" Training pipeline failed with error message: Internal error occurred. Please retry in a few minutes. If you still experience errors, contact VertexAI"
My questions:
- How can I find out what went wrong?
- Is there a way to set a maximum for node-hours like for AutoML for Image Object Detection?
- How can I estimate how long my training should take?
Edit: Started the same thing again, this time crashed after 9 hrs with the same pattern. No idea how to troubleshoot.
Solved! Go to Solution.
Hi @MrVertex ,
we talked to the support in the end and it turned out the problem was somewhere inside google cloud. Something about resource allocation, and they were working on a solution.
We ended up not using this feature because of that.
The answers to the question were roughly: "This is not very configurable and right now we need to fix it internally."
Not very helpful, sorry 🙂
Can you try following this guide and use your videos instead? Also please be guided about the accepted data for this auto ML training . As internal error it would be recommended to file for support if this issue persists. https://cloud.google.com/contact
I'have the same problem, and have the same questions, so I wonder if you got any of them answered? 🤓
Hi @MrVertex ,
we talked to the support in the end and it turned out the problem was somewhere inside google cloud. Something about resource allocation, and they were working on a solution.
We ended up not using this feature because of that.
The answers to the question were roughly: "This is not very configurable and right now we need to fix it internally."
Not very helpful, sorry 🙂
Thanks for the reply!
It helps to know that it might not be PEBCAK (problem exists between chair and keyboard) 😅
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |