Re: Preparation of Input File for Vertex AI

jastipbaobao · 08-07-2023 10:24 PM

I possess a synthetic dataset of drug prescriptions that contains around 18,000 images. Each image is equipped with labeled fields, such as the patient's name, type of drugs, and so on, accompanied by their corresponding bounding boxes. To mark the fields on the prescription, I utilized VGG image annotation software. The labels and bounding boxes for all images were saved in a large JSON file.

After uploading the dataset to cloud storage, I obtained the cloud path of each image in CSV format. To create the input file in CSV, I merged the labels and bounding boxes (after converting them to CSV) with the image path. Since the input file was too large, I split the file into chunks and uploaded them successfully to Vertex AI. However, upon checking the uploaded dataset, I noticed that the images were present, but the label and bounding boxes were missing.

I would appreciate if anyone could provide me with insight into what went wrong or help me resolve the issue. Thank you.

lsolatorio

Hi @jastipbaobao,

Thank you for reaching out to the community.

As I review the steps you took, one thing I noticed is the merging of the image path with the labels and bounding boxes data. I suspect that Vertex AI was able to recognize the image path up until the file type (.jepg, .png, .gif, ...) but was not able to properly process the labels and bounding boxes because it is in the same data column, hence your dataset results.

I recommend not merging these data points as it will confuse the API. You can refer to the Input files documentation for further review.

Hope this helps.