Image annotation: batch vs online json response - Page 2

OnurKerimoglu · 09-26-2024 03:08 AM

Hi everyone,

I am trying to convert an online image annotation process to a batch process, following the python recipe here: https://cloud.google.com/vision/docs/batch (I'm using google-cloud-vision 3.1.4, and importing with: 'from google.cloud import vision').

The problem is, the properties of the json response I extract from the batch annotation is different from that I get from the online annotation, so that my existing postprocessing pipeline is not working with the batch outputs.

In the online version, I am using two steps to generate a json response:

api_response =vision.ImageAnnotatorClient().annotate_image(
vision.AnnotateImageRequest(image, features))
api_response_json_str = vision.AnnotateImageResponse.to_json(api_response)

where, the only feature I am specifying is vision.Feature(type_="DOCUMENT_TEXT_DETECTION").

In the batch version, a single step dumps the response as a json file in the gcs bucket specified in the output config:

vision.ImageAnnotatorClient().async_batch_annotate_images(requests, output_config)

looking at the vision.ImageAnnotatorClient.async_batch_annotate_images(), the second step of the online process seems to be already executed (through image_annotator.AsyncBatchAnnotateImagesResponse), which makes sense. But it seems as if the behavior of the AnnotateImageResponse and the AsyncBatchAnnotateImagesResponse are different.

Here a segment of the json from the online version:

{

"property": {

"detectedBreak": {

"type": 1,

"isPrefix": false

},

"detectedLanguages": []

},

"boundingBox": {

"vertices": [

{

"x": 99,

"y": 99

},

{

"x": 106,

"y": 99

},

{

"x": 106,

"y": 109

},

{

"x": 99,

"y": 109

}

],

"normalizedVertices": []

},

"text": "n",

"confidence": 0.98761535

}

And here the same segment from the batch prediction:

{

"property": {

"detectedBreak": {

"type": "SPACE"

}

},

"boundingBox": {

"vertices": [

{

"x": 99,

"y": 99

},

{

"x": 106,

"y": 99

},

{

"x": 106,

"y": 109

},

{

"x": 99,

"y": 109

}

]

},

"text": "n",

"confidence": 0.98761535

}

Note how all the numbers match, but not all the content.

I could not find any relevant documentation. As the last resort, of course I can hack my code or the json files to make things work, but I feel like there must be a way to reproduce the behavior of the online request (and actually this must be the case by default!).

Any help would be greatly appreciated!

Best,
Onur