Hello All,
I use this official playground site (https://cloud.google.com/vision/docs/drag-and-drop) to try out the text detection, but the most important word is wrong.
BALLANTINE to KALLANTINE
Here are request details :
url : https://vision.googleapis.com/v1/images:annotate
The models used by Cloud Vision API service are always being improved in order to provide a better recognition accuracy; however, sometimes they get the characters wrong or even they don't recognize the characters themselves. Keep in mind these services are trained in a daily basis which means the recognition quality will increase accordingly.
For best results, you may follow the supported file format, image size and language. This includes the feature languageHints for more accurate result since you can already specify the exact language to be detected.
I'm wondering if an additional possibility might be domain specific. For example, if the challenge is to extract text from a domain of bottles, I'm thinking that there might be a manageable set of possibilities where the recognition could be improved with retraining. See for example: https://cloud.google.com/vertex-ai/docs/training-overview
I'm wondering what the "confidence" score on the extracted text might be for the example? If the confidence is low then this might be a trigger to pass it for human review, explicit labeling and retraining so that for FUTURE recognitions, the accuracy would be better.