When extracting text OCR from images, 'loca'l vs '...

JakeLee · 03-08-2024 10:49 PM

Original image:

original image

Local result:

Try it sample page result:

Try it sample page

The result is bad because it shows a cropped image in addition to the target image. How can I make sure that only the target image is shown as in the sample case?
Do I need to change anything in my source code?

client = vision.ImageAnnotatorClient()

with open(path, "rb") as image_file:

content = image_file.read()

image = vision.Image(content=content)

response1 = vision.AnnotateImageRequest(image=image, features=[{'type_': vision.Feature.Type.DOCUMENT_TEXT_DETECTION}], image_context={"text_detection_params": {"advanced_ocr_options": ["legacy_layout"]}})

response = client.annotate_image(response1)

document = response.full_text_annotation

Poala_Tenorio

It appears that you are using the Google Cloud Vision API to perform OCR (Optical Character Recognition) on an image. The issue you're encountering, where the result includes a cropped image along with the target image, might be due to how you're processing or displaying the results.

The Google Cloud Vision API typically doesn't return a cropped image along with the OCR results unless specifically requested. However, if you're seeing a cropped image in the result, it might be due to how you're interpreting or displaying the data returned by the API.

Here's what you can do to ensure only the text detected in the target image is shown:

Check the Response: Examine the response object returned by the client.annotate_image() call. Make sure you're properly parsing the response to extract only the text annotations and not any other data, like images.

Review Your Display Logic: Ensure that your code for displaying the OCR results is only showing the text annotations (document in your case) and not any other content, such as images or additional metadata.

Use OCR Results Only: Instead of displaying the entire response, focus on displaying just the text detected in the image. You can access the OCR results from response.full_text_annotation.text.

Here's a modified version of your code focusing on displaying only the text detected in the image:

from google.cloud import vision

client = vision.ImageAnnotatorClient()

with open(path, "rb") as image_file:
content = image_file.read()

image = vision.Image(content=content)

response = client.text_detection(image=image)

texts = response.text_annotations

if texts:
# Display only the first piece of text (assumed to be the main content)
print(texts[0].description)
else:
print("No text found.")

from google.cloud import vision client = vision.ImageAnnotatorClient() with open(path, "rb") as image_file:content = image_file.read() image = vision.Image(content=content) response = client.text_detection(image=image) texts = response.text_annotations if texts:# Display only the first piece of text (assumed to be the main content)print(texts[0].description)else:print("No text found.")

In this code, response.text_annotations contains the detected text. We're assuming that the first annotation contains the main text content. You might need to adjust this logic based on your specific use case.

Ensure that you're handling the response properly and displaying only the relevant information. If you continue to face issues, provide more details about the response structure and how you're processing it, so I can assist you further.

When extracting text OCR from images, 'loca'l vs 'Try it page' shows different results