Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex AI Search : Why DataStore limit NDJSON file to Google Cloud Storage( maximum100 files)

I attempted to create The Vertex-ai search as link below

https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#cloud-storage 

https://cloud.google.com/generative-ai-app-builder/docs/prepare-data#storage-structured 

For the datastore, I uploaded  711 NDJSON files(structure data) to   Google Cloud storage. 

I ran into the error as the screen shows.The error is "Failed to import data: the request contained 711 files which exceeds the max number of tiles allowed  to Google Cloud Storage(100 files)

How to increase the quota to more than 100 files?

Import data from Cloud Storage – Search & Conversati… – 0303-Pongthorn – Google .png

-----------------------------------------------------------------------------------------------------------

You can read more details about what I perform below.

First of all, let's take a closer look at what file structure looks like in Google storage that I require. there are 711 files.

3866-incident.ndjson

3867-incident.ndjson

{ID}-incident.ndjson

Import2 data from Cloud Storage – Search & Conversati… – 0303-Pongthorn – Google .png

pongthorn_1-1699082085911.png

Let me elaborate more on why I had to upload tons of files instead of a single file.

It is a search system to find knowledge regarding how to fix the problem of each IT incident. Each file is detail of the incident  that use ID as prefix of file name as figure below , and each incident/file will probably be updated in some detail in the file on a daily basis

Therefore, as I can figure it out, it is the easiest way to enable me to replace the existing file with a new one, if an incident has something changed in such incident-ID written as JSON file  in order to Google storage.

Primarily, This is how my system works, my system will run a script to crawl incident data updated on a daily basis.  like 1 updated incident / 1 file, if today there are 20 incident cases then there will be 20 files.

So It is not so convenient for me to generate a single file that collects all data in 1 file due to the fact that if I need to use one single file(<100 files as a limitation),  I cannot come up with the solution to synch data to JSON file consistently and correctly, in case if there is something on incident case to be updated from my database to JSON file on google cloud storage.

if it is not workable,  Is there any other solution, if you are limited to 100 JSON files?

Thank you very much

Pongthorn

Solved Solved
0 1 1,460
1 ACCEPTED SOLUTION

Hi @pongthorn

Thank you for reaching out to our community for help.

I understand that you are having challenges importing your files to Google Cloud Storage. You may need to contact Google Cloud Support for your concerns with quotas and limits.

Since the API mentions 100 files per upload, you can try uploading your files per batch, say 8 batches for this instance, having 100 files or less per upload request. You can also review this document to see what upload method is applicable to you.

Hope this helps.

View solution in original post

1 REPLY 1

Hi @pongthorn

Thank you for reaching out to our community for help.

I understand that you are having challenges importing your files to Google Cloud Storage. You may need to contact Google Cloud Support for your concerns with quotas and limits.

Since the API mentions 100 files per upload, you can try uploading your files per batch, say 8 batches for this instance, having 100 files or less per upload request. You can also review this document to see what upload method is applicable to you.

Hope this helps.