Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

OCR text recognition for changing images.

Hi, 

I'm new to Cloud vision and am looking into how I can Scan an image and detect a specific Job Number E.g. 28379 that has a unique colour. We have images of steel which have lots of text and we figured if we write the Job number in a different colour, we could extract only that information to then rename the images to this job number and if there is multiple of the same job number append the data with a (1),(2),(3) etc on the end of each image. My goal is to automate renaming a full folder of photos based of the Job number detected in each image. 

We are using Power Automate for most of our automation process, so ideally we would like to access the Cloud via Power Automate (Text detection via API Gateway key to access Google Cloud SDK Shell). 

Does anyone know how to do this or any advice/tips on how to get this working. 

0 5 2,181
5 REPLIES 5

Do you have any control over the textual content contained in the image?  I think I am sensing that you may and you are looking to encode information (meta information ... i.e. Job Number) in the visual color of the text.  A thought that is occurring to me is not to use color but instead to "tag" the information.  For example instead of "28379" have the text be "JN:28379".   This way OCR could or other text extraction could be applied to the image and the tagged Job Number might lend itself to identification?   Another thought is position of the text.  For example, if the Job Number were always in the top left corner of the image, then we could potentially train a Document AI parser with labeled data as the Document AI Parser has layout awareness.

If you do want to go with color encoding ... an initial thought occurs to me which would be to "filter" the image before applying OCR or text extraction.  For example, if you coded the color of the Job Number as Magenta (#FF00FF) then we could perform a filter over the image ... for example capturing pixels where:

R > #FA, G < #04, B > #FA

and then perform OCR/text extraction over that filtered image.  Presumably other text would be removed.

Thanks @kolban for your reply. I don't have much control over the photos and they are different each time, and is why we are thinking up strategies that can allow us to have enough control to reliable OCR extract the information we need. Good idea though for the JN:28379, We have thought of possibly doing something like this. If we made this change, how could I hypothetically code this? Or do you know of a tool in Power automate that could extract this specific information. 

Howdy.  From the Google cloud perspective, my first examination would be the use of Google Cloud Vision AI.  This is an API service that can take an image as input and return you a structured document (JSON) containing the extracted text.  Once you have the extracted text, you could then parse that data looking for your data.   For example, if Job Numbers were tagged, you could use a REGEX to look for the prefixes.   Another possibility is Google Document AI but I'm not sensing this to be as good a match for you based on the description so far.

To code the extraction, at the highest level ... we would have to look at where the image originates.  Are you processing the image on an on-premises server or in the cloud?  Is it available through code / APIs or is it saved as a file on disk or blob storage?  Do you want to process the images one at a time or in batch?

So many permutations.   However, from the Google Cloud Service perspective, loosely think of Google Cloud as a "black box".  I could imagine you using Google Cloud Functions to expose a REST based service that could be called.  You would then invoke your Cloud Functions service passing in the image data as a parameter.  The Cloud Function would then invoke Google Cloud Vision AI passing in the image.  The output would be the extracted text from the image.  This would then be returned to the caller.

Currently I have set up a Cloud vision AI to extract the data I need (Via an API access)  which is brought into Power Automate. The process of extraction is from a folder on my desktop for testing purposes but the final 'version' we want to work on our server (which I know how to setup). I did not know you could batch process the images however, I kept on getting errors back, could you possibly explain how this would work?  Also I've tried to implement a regex but don't know too much about it and the closest I've got is to 'look' only for 5 numbers, and obviously sometimes it detects multiple and chooses the wrong  number, do you know much on and regex pattern that I can implement. The one I have currently is "\d{5}". 

Also to loosely explain my process for clarification what I want to do is have images scanned via a Cloud vision ai to detect for text which is brought back to power automate (Completely understand you may not know this program). Then using the text I get, extract only the "JN:34894" (Job number), which is then used to rename the image - And if the image name already exist, append the data with a (1),(2), etc. and then obviously repeat this till all images in the folder are renamed. 

Howdy ... unfortunately through this community, we can't *code up* a solution for you.

Batch processing:  I am seeing two primary APIs for OCR processing:
1. https://cloud.google.com/vision/docs/reference/rest/v1/images/annotate  ... takes a set of images and OCR annotates them but BLOCKS with the result

2. https://cloud.google.com/vision/docs/reference/rest/v1/images/asyncBatchAnnotate .... takes a set of images and OCR annotates them in the background and you can determine when done

As for RegEx, here are some good links:

To match "JN:34894", an expression similar to /JN:\d{5}/  feels like it might be right.