Hi, I'm new to GCP and I tried to create an AutoML model in Vertex AI for that I created a dataset by taking 20 .txt files and by adding 4 labels but training failed, showing insufficient validation data labeled error. I went through the documents as well but the issue is not resolved yet. Please help me out with this.
Thanks,
John
Good day @john123,
Welcome to Google Cloud Community!
Please note that you need to follow the data requirements before creating a dataset or training a specific model. Since you haven't mentioned the AutoML model you are trying to build, based on the information you have provided I assume that you are trying to train an AutoML model based on text data, If this is the case try checking the following:
If you are trying to train an AutoML classification model from a text dataset, check the following data requirements based on the documentation:
You can check this link for more information: https://cloud.google.com/vertex-ai/docs/text-data/classification/prepare-data
- You must supply at least 20, and no more than 1,000,000, training documents.
- You must supply at least 2, and no more than 5000, unique category labels.
- You must apply each label to at least 10 documents.
- For multi-label classification, you can apply one or multiple labels to a document.
- You can include documents inline or reference TXT files that are in Cloud Storage buckets.
If you are trying to train an AutoML entity extraction model, here are the data requirements based on the documentation:
- You must supply at least 20, and no more than 1,000,000, training documents.
- You must supply at least 50, and no more than 100,000, training documents.
- You must supply at least 1, and no more than 100, unique labels to annotate entities that you want to extract.
- You can use a label to annotate between 1 and 10 words.
- Label names can be between 2 and 30 characters.
- You can include annotations in your JSON Lines files, or you can add annotations later by using the Google Cloud console after uploading documents.
- You can include documents inline or reference TXT files that are in Cloud Storage buckets.
For more information you can visit this link: https://cloud.google.com/vertex-ai/docs/text-data/entity-extraction/prepare-data
But If you are trying to train an AutoML sentiment analysis model, this is the data requirements based on the documentation:
For more information you can visit this link: https://cloud.google.com/vertex-ai/docs/text-data/sentiment-analysis/prepare-data#data_requirements
- You must supply at least 10, but no more than 100,000, total training documents.
- A sentiment value must be an integer from 0 to 10. The maximum sentiment value is your choice. For example, if you want to identify whether the sentiment is negative, positive, or neutral, you can label the training data with sentiment scores of 0 (negative), 1 (neutral), and 2 (positive). The maximum sentiment score for this dataset is 2. If you want to capture more granularity, such as five levels of sentiment, you can label documents from 0 (most negative) to 4 (most positive).
- You must apply each sentiment value to at least 10 documents.
- Sentiment score values must be consecutive integers starting from zero. If you have gaps in scores or don't start from zero, remap your scores to be consecutive integers starting from zero.
- You can include documents inline or reference TXT files that are in Cloud Storage buckets.
It is important that you follow the data requirements before you create your dataset in Vertex AI, and make sure that you also follow their best practices.
Hope this is useful!
Hi John
I had the same issue and followed the guidelines mentioned above by kvandres and also read the help steps - and was able to get the dataset working.
@kvandres : I am facing an issue on Tuning Model (Preview) - i got the .jsonl file ready with around 200 text (given one example below of my JSONL file)
{"input_text": "What is the history of basketball?", "output_text": "Basketball was invented by Dr. James Naismith in December 1891 in Springfield, Massachusetts. It has since become one of the most popular sports in the United States and around the world."}
Everything works fine in Step-1 and in Step-2 - as soon as i click on the Start Tuning button - i get a blip and nothing - no error message coming up or any other system issue - when i go back to the Tuning Dashboard - i do not see anything.
Can you please assists ?