Issue with Data Extraction Using Document AI: Confusing "0" (Zero) with "O" (Letter O)

Hello, I’m reaching out because I’m experiencing an issue with data extraction from PDF files using Google Document AI. I’m working with a custom extractor, and I’ve noticed that when extracting fields that contain a mix of letters and numbers, the system often confuses the digit "0" (zero) with the uppercase letter "O."

This happens particularly in specific fields where such combinations are frequent, leading to errors in the extracted data. For example, a value like "A0B1" may be incorrectly extracted as "AOB1" or vice versa.

I’ve tried to adjust the custom model, but this issue persists. Has anyone faced a similar problem? Are there best practices, configurations, or post-processing techniques that could help resolve this?

Any advice or recommendations would be greatly appreciated!

0 1 377

1 REPLY 1

never-displayed