Extraction is not proper

Suganya_velu

the below is the image passed to GCP vision API

but the expected result it not proper

the second occurrence is not working as expected. this occurs when we give it as PDF format.

nikacalupas

Hi Suganya_velu,

Welcome to the Google Cloud Community!

Here are some suggestions that might help your use case:

Generate PDFs with high resolution (e.g., 300-600 DPI), flatten them to render text as an image layer, use standard and clear fonts, and minimize image compression to preserve quality.

Convert PDF pages to high-resolution images (e.g., PNG) before sending them to the Vision API. This ensures the API receives a clear, rasterized input.

Use DOCUMENT_TEXT_DETECTION instead of TEXT_DETECTION for better document structure understanding. You can refer to this documentation.
Even if the Vision API doesn't extract perfectly, you can use regular expressions on the raw extracted text to find patterns that match credit card numbers.

Additionally, you may refer to the following documentation, which could be helpful for your use case:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help