Re: Matching engine not accepting Batch Prediction...

sateesh143225 · 07-09-2023 05:09 AM

I am attempting to build a question answering bot using a model from TensorFlow Hub, specifically the model located at https://tfhub.dev/google/universal-sentence-encoder-qa/3. I have successfully uploaded the model to the model registry and performed batch predictions, resulting in two text/plain files as output. I am now working on creating a matching index using the DOT_PRODUCT method. Below is the code I am using for this task.

But getting errorr as

details = "Found file `gs://embeddedData/prediction-responseEncoder-2023_07_09T03_20_39_530Z/prediction.errors_stats-00000-of-00001` with unknown format, please make sure your files include the supported file extension (e.g. `.json`, `.csv` or `.avro`) in your file name."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:173.194.202.95:443 {created_time:"2023-07-09T10:49:50.681664689+00:00", grpc_status:9, grpc_message:"Found file `gs://embeddedData/prediction-responseEncoder-2023_07_09T03_20_39_530Z/prediction.errors_stats-00000-of-00001` with unknown format, please make sure your files include the supported file extension (e.g. `.json`, `.csv` or `.avro`) in your file name."}"
>

details = "Found file `gs://embeddedData/prediction-responseEncoder-2023_07_09T03_20_39_530Z/prediction.errors_stats-00000-of-00001` with unknown format, please make sure your files include the supported file extension (e.g. `.json`, `.csv` or `.avro`) in your file name." debug_error_string = "UNKNOWN:Error received from peer ipv4:173.194.202.95:443 {created_time:"2023-07-09T10:49:50.681664689+00:00", grpc_status:9, grpc_message:"Found file `gs://embeddedData/prediction-responseEncoder-2023_07_09T03_20_39_530Z/prediction.errors_stats-00000-of-00001` with unknown format, please make sure your files include the supported file extension (e.g. `.json`, `.csv` or `.avro`) in your file name."}" >

How do I resolve this error?, Do I need to convert the predictions to json format?.

Its very urgent for me, if anyone can help please let us know

kvandres

Good day @sateesh143225,

Welcome to Google Cloud Community!

Based on the screenshots and error message that you have sent, you are encountering this error since your prediction files in the Google Cloud Storage bucket are in wrong format, they are currently in text/plain format, but it only supports '.json', '.csv' and '.avro' file format, to solve this issue you can try changing the file format of your the prediction files to .json and make sure that you have included the .json in the suffix (e.g. prediction.json) since it is a requirement for your input directory structure. Also in case you haven't, you need to create a batch root directory for each batch of input data files, after that verify if it will solve the problem. For more information of the input directory structure, you can visit this link: https://cloud.google.com/vertex-ai/docs/matching-engine/match-eng-setup/format-structure#input_direc...

Hope this helps!

Matching engine not accepting Batch Prediction embeddings and generating error