In Vertex AI I am updating an image dataset, thus:
from google.cloud import aiplatform
import_schema_uri = aiplatform.schema.dataset.ioformat.image.single_label_classification
dataset_id = "my_ds_id"
ds = aiplatform.ImageDataset(dataset_id)))
ds.import_data(gcs_source=DATASET_PATH, import_schema_uri=import_schema_uri)
the images are uploaded to the dataset but their labels are ignored and they are classed as Unlabeled. What am I doing wrong? TIA!
PS they are in a csv, like:
gs://path/to/file/barnacles.jpg,label1
which worked fine for the dataset creation.
You could check this sample code to Import data for image classification single label:
from google.cloud import aiplatform
def import_data_image_classification_single_label_sample(
project: str,
dataset_id: str,
gcs_source_uri: str,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
timeout: int = 1800,
):
# The AI Platform services require regional API endpoints.
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
# This client only needs to be created once, and can be reused for multiple requests.
client = aiplatform.gapic.DatasetServiceClient(client_options=client_options)
import_configs = [
{
"gcs_source": {"uris": [gcs_source_uri]},
"import_schema_uri": "gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_single_label_io_format_1.0.0.yaml",
}
]
name = client.dataset_path(project=project, location=location, dataset=dataset_id)
response = client.import_data(name=name, import_configs=import_configs)
print("Long running operation:", response.operation.name)
import_data_response = response.result(timeout=timeout)
print("import_data_response:", import_data_response)
Thanks, but exactly the same result.
From this Tensorflow blog post:
In addition to image files, we've provided a CSV file (
all_data.csv
) containing the image URIs and labels. We randomly split this data into two files, train_set.csv and eval_set.csv, with 90% data for training and 10% for eval, respectively.gs://cloud-ml-data/img/flower_photos/dandelion/17388674711_6dca8a2e8b_n.jpg,dandelion gs://cloud-ml-data/img/flower_photos/sunflowers/9555824387_32b151e9b0_m.jpg,sunflowers gs://cloud-ml-data/img/flower_photos/daisy/14523675369_97c31d0b5b.jpg,daisy gs://cloud-ml-data/img/flower_photos/roses/512578026_f6e6f2ad26.jpg,roses gs://cloud-ml-data/img/flower_photos/tulips/497305666_b5d4348826_n.jpg,tulips
We also need a text file containing all the labels (
dict.txt
), which is used to sequentially map labels to internally used IDs. In this case, daisy would become ID 0 and tulips would become 4. If the label isn't in the file, it will be ignored from preprocessing and training.daisy dandelion roses sunflowers tulips
Therefore, you need to create the dict.txt
file which will have the all the labels used as shown above.
See also:
Thanks but that is six years old and not a Vertex AI dataset.
Could you please raise a private thread in the issue tracker (referencing this question, as stated in the template) with the project ID, job ID and a sample data of your input CSV file (Don't want the entire file or any PII)?
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |