Re: Two problematic images for Vision api

Former Community Member · 02-12-2023 02:57 AM

I attached a few images that are a bit problematic for vision api, note that i'll keep updating this topic with more images and their area of problem if there are.

Description for each image is in its caption.

the string "omerat" not being picked

the string "jEWONDER" is being picked with an upper-case "J"

kolban

Sadly, I'm not a Vertex AI Image specialist ... but I'll throw in my 2 cents. When we perform text extraction from an image, we get back a structured record describing what was found. I believe that this is the doc. When we look at a record, we see that each entity has an associated "score" between 0 and 1 that represents the confidence/score. If the score is high, Vertex AI is claiming that it is more confident in the result ... if low, then Vertex AI has less confidence in the result. You started your post with the notion that Vertex AI is correct 99.9% of the time. From my perspective, that feels good. We usually think of software as being correct 100% of the time ... 1 + 1 had always better equal exactly 2 ... however I'm getting the sense that in your design, you are "grabbing" screen images, converting them to image format and then passing those image formats through Vertex AI to extract information. I liken this to the notion of instead of sending someone a spreadsheet of data, we are printing out the spreadsheet, taking a photocopy, sending it via fax and then asking someone to re-enter the data. Transcription errors will occur.

In Vertex AI, the score value is typically used to direct extracted data to a secondary process for resolution. For example, if this were an "invoice" being processed, we had better be sure that the amounts are as accurate as possible. If a recognition score were low, we might send that to a human to review and correct if necessary while allowing good scored data (say 97% or above) to be passed through without human interaction.

Former Community Member

I'm dealing with 100s of images like these on a daily basis and the Vision api does an excellent job in terms of accuracy. I just wanted to let the team know about these two images in order to train the model to be more accurate so I can prevent such glitch in the future. But it's very rare to get these type of mistakes, that's all.

kolban

Thank you so much for the report. I'm sure members from the Google Vision AI teams monitor this community. Getting feedback on images that don't interpret correctly is likely going to be excellent data that can be used to improve the recognition in the future. Again, thank you so much for taking the time to report your findings. Very much appreciated.

Unprocessed or partial text in a few images