Re: How should the input JSONL look for a batch pr...

sangersteel · 04-04-2022 10:42 AM

I can't find any examples online of how an input jsonl is supposed to look for a batch training job. When I tried with this:

{'instances': {'mimeType': 'text/plain', 'content': '0'}}
{'instances': {'mimeType': 'text/plain', 'content': '1'}}
{'instances': {'mimeType': 'text/plain', 'content': '1'}}
{'instances': {'mimeType': 'text/plain', 'content': '2'}}
{'instances': {'mimeType': 'text/plain', 'content': '8'}}
{'instances': {'mimeType': 'text/plain', 'content': 'a'}}
{'instances': {'mimeType': 'text/plain', 'content': 'a'}}
{'instances': {'mimeType': 'text/plain', 'content': 'h'}}
{'instances': {'mimeType': 'text/plain', 'content': 'q'}}
{'instances': {'mimeType': 'text/plain', 'content': 's'}}
{'instances': {'mimeType': 'text/plain', 'content': 'y'}}
{'instances': {'mimeType': 'text/plain', 'content': 'y'}}

I got an error email saying

Error Messages: BatchPrediction could not start because no valid instances
were found in the input file.

Is there some other way this should look for it to work? Maybe like

{

'instances': [

{'mimeType': 'text/plain', 'content': 'a'}

{'mimeType': 'text/plain', 'content': 'bc'}

{'mimeType': 'text/plain', 'content': 'de'}]

}

ghayas_muhammad

Hell sangersteel,
It is not possible to use a JSONL file for batch prediction of text classification. Only a CSV file format is accepted for text classification. This is indicated in the [1] [AutoML Natural Language documentation] The CSV file should only contain 1 file (input file) per row. The CSV file and each input file needs to be stored in your Cloud Storage bucket.

[1] https://cloud.google.com/natural-language/automl/docs/predict#batch_prediction.

How should the input JSONL look for a batch prediction job?