Re: Document AI, upside-down text not recognized, ...

iantolan · 10-29-2024 07:23 AM

When I call the DocumentAI API with an image or pdf with a mix of right-side-up and upside down text.

The upside down text is not picked up. Interestingly, if I crop the image so that the majority of the text is upside-down, the upside-down text is picked up, but the right-side-up text is then missing from the response.

In both cases, the sideways text is picked up without issues.

Any tips on why this might be the case, or ways to detect both upside-down and rightside-up text in the same image?

ibaui

Hi @iantolan,

Welcome to Google Cloud Community!

The behavior you're seeing with the DocumentAI API could be related to how the OCR (Optical Character Recognition) system processes and interprets text orientation.

An essential component of any OCR system is image preprocessing - the higher the quality input image you present to the OCR engine, the better your OCR output will be. To be successful in OCR, you need to review arguably the most important pre-processing step: text orientation. With regard to the issue you are encountering, here are some potential reasons why this might be happening:

Dominant Orientation: DocumentAI's text detection algorithms often prioritize the dominant orientation of text in an image. If the majority of the text is right-side-up, the model might assume that's the intended orientation and struggle to detect upside-down text.

Cropping Bias: When you crop the image to focus on upside-down text, you're essentially providing a stronger signal to the model that this is the intended orientation. This can lead to the model focusing on the upside-down text and missing the right-side-up text. Cropping helps focus OCR technology on relevant text by eliminating unnecessary borders or graphics that might confuse the OCR extraction software. This precision ensures that OCR only processes the information that matters, enhancing data quality.

Below are potential solutions you can consider to address the issue:

Preprocessing the Image for Orientation Normalization: Consider preprocessing the image to standardize text orientation before passing it to the DocumentAI API. Rotating documents to the correct angle reduces errors significantly during OCR processing. This simple yet effective step ensures that text is presented in its most readable form, helping OCR software to function at its best.

Splitting the Image: If the image has both right-side-up and upside-down text, try splitting the image into two regions: one containing primarily right-side-up text and one containing primarily upside-down text. Pass these regions through the DocumentAI API separately and then combine the results. This might help ensure both text orientations are accurately detected.

Check for Image Quality and Resolution: Low resolution or poor-quality scans might contribute to OCR difficulties in detecting upside-down text. Ensuring high-quality, high-resolution images will improve the OCR's ability to detect various orientations.

You can also read the following documentation for more details:

I hope the above information is helpful.

Document AI, upside-down text not recognized, when on a page with majority right-side-up text