Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

import file (.csv) too big to import to vertex ai dataset

I have image dataset of receipts of about 12gb along with their label and bounding boxes (.json) of about 800 mb. I have uploaded the images dataset successfully to gcp bucket and created the cloud storage path in CSV. I then combined the path with the label and bounding boxes to create the input file in CSV. However, when I Uploaded the file (CSV) to gcp, I got an error that said my input file was too big to import, the limitation is 209715200 byte. My input file (CSV) is about 780 mb. How can I upload large input file to gcp?

 

0 2 1,454
2 REPLIES 2

Good day @jastipbaobao,

Welcome to Google Cloud Community!

If I understand correctly, you are trying to import a large dataset to Vertex AI but you are unable to import it due to the error limitation, if this is the case, you can try dividing your large dataset into smaller datasets and then try importing each smaller datasets to Vertex AI and see if it will solve the problem. You can also try implementing Tabular Workflow for End-to-End ML if this is for classification or regression tasks, one of its benefits: 

 Supports large datasets that are multiple TB in size and with up to 1000 columns. You can check this documentation to learn more:
https://cloud.google.com/vertex-ai/docs/tabular-data/tabular-workflows/e2e-automl

It will also allow you to control every step of your pipeline instead of the whole pipeline. 

I hope this helps!

Thank you very much @kvandres