Importing to Vertex dataset does not import labels.

In Vertex AI I am updating an image dataset, thus:

from google.cloud import aiplatform
import_schema_uri = aiplatform.schema.dataset.ioformat.image.single_label_classification
dataset_id = "my_ds_id"

ds = aiplatform.ImageDataset(dataset_id)))
ds.import_data(gcs_source=DATASET_PATH, import_schema_uri=import_schema_uri)

the images are uploaded to the dataset but their labels are ignored and they are classed as Unlabeled. What am I doing wrong? TIA!

PS they are in a csv, like:

gs://path/to/file/barnacles.jpg,label1

which worked fine for the dataset creation. 

0 5 953
5 REPLIES 5

You could check this sample code to Import data for image classification single label:

 

from google.cloud import aiplatform


def import_data_image_classification_single_label_sample(
    project: str,
    dataset_id: str,
    gcs_source_uri: str,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
    timeout: int = 1800,
):
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aiplatform.gapic.DatasetServiceClient(client_options=client_options)
    import_configs = [
        {
            "gcs_source": {"uris": [gcs_source_uri]},
            "import_schema_uri": "gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_single_label_io_format_1.0.0.yaml",
        }
    ]
    name = client.dataset_path(project=project, location=location, dataset=dataset_id)
    response = client.import_data(name=name, import_configs=import_configs)
    print("Long running operation:", response.operation.name)
    import_data_response = response.result(timeout=timeout)
    print("import_data_response:", import_data_response)


 

Thanks, but exactly the same result.

From this Tensorflow blog post:

In addition to image files, we've provided a CSV file (all_data.csv) containing the image URIs and labels. We randomly split this data into two files, train_set.csv and eval_set.csv, with 90% data for training and 10% for eval, respectively.

gs://cloud-ml-data/img/flower_photos/dandelion/17388674711_6dca8a2e8b_n.jpg,dandelion
gs://cloud-ml-data/img/flower_photos/sunflowers/9555824387_32b151e9b0_m.jpg,sunflowers
gs://cloud-ml-data/img/flower_photos/daisy/14523675369_97c31d0b5b.jpg,daisy
gs://cloud-ml-data/img/flower_photos/roses/512578026_f6e6f2ad26.jpg,roses
gs://cloud-ml-data/img/flower_photos/tulips/497305666_b5d4348826_n.jpg,tulips

We also need a text file containing all the labels (dict.txt), which is used to sequentially map labels to internally used IDs. In this case, daisy would become ID 0 and tulips would become 4. If the label isn't in the file, it will be ignored from preprocessing and training.

daisy 
dandelion 
roses 
sunflowers 
tulips 

Therefore, you need to create the dict.txt file which will have the all the labels used as shown above.

See also:

Thanks but that is six years old and not a Vertex AI dataset.

Could you please raise a private thread in the issue tracker (referencing this question, as stated in the template) with the project ID, job ID and a sample data of your input CSV file (Don't want the entire file or any PII)?