Hello Google Cloud Community,
I'm encountering a persistent "Internal Server Error" (Code 13) when attempting to train a single-label image classification model using Vertex AI AutoML. I'm following the official Google notebook: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl_....
Here's a breakdown of the issue and what I've tried:
- Problem: When I initiate training for a single-label image classification model, the process consistently fails with "Internal Server Error" (Code 13).
- Permissions: I have verified that my service account has all the necessary permissions for Vertex AI and related services.
- APIs Enabled: All required APIs are enabled for the project.
- Region Switching: I've attempted to train the model in different Google Cloud regions, but the error persists across all of them.
- Important Observation (The Catch): Interestingly, when I use a similar notebook to train a tabular data model on Vertex AI, the training completes successfully without any errors. This suggests the issue might be specific to the image classification pipeline or configurations.
- Notebook Used: I am directly utilizing the official notebook provided by Google: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl_...
Could anyone provide insights or suggestions on what might be causing this "Internal Server Error" (Code 13) specifically for AutoML Image Classification? Has anyone else experienced a similar issue, or are there specific configurations or troubleshooting steps I should consider for image classification models on Vertex AI?
Any help would be greatly appreciated!