I started to train a AutoML image classifier on Vertex AI. The data set is just about 36 images of dogs and flowers. I did this for purposes in my bachelor's thesis.
However, the model doesn't stop training. It is running for 11 days now, while it took only a couple of seconds to run a TensorFlow NN on these images on my machine. What can I do about it? I need to get done with this really quick, as I want to examine XAI features and model deployment on this.
Hi @LouisPetrik,
Welcome to Google Cloud Community!
It looks like you are experiencing a significant issue with your AutoML image classification training job on Vertex AI. The training job has been running for an excessively long and unreasonable time (11+ days) with a very small dataset (36 images), despite having a limited budget of only 2 node hours. This indicates a problem with the training process, likely a bug or misconfiguration, preventing you from completing your thesis work involving XAI and model deployment.
Here are the potential ways that might help with your use case:
If you continue to run into issues, consider reaching out to Google Cloud Support to further check underlying issues. When you contact them, be sure to provide as much detail as possible and include screenshots. This will help them understand your problem better and get it sorted out more quickly.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.