VertexAI Document Reader for PDF

tjohnson818 · 08-02-2024 06:19 AM

My goal is to be able to process large amounts of technical data sheets and extract the material properties within them using this functionality: https://cloud.google.com/vertex-ai/generative-ai/docs/samples/generativeaionvertexai-gemini-pdf

So far, its been 90% accurate. Most of the mistakes are attributing values in a table to the adjacent column or row.

Is this par for the course at the moment when it comes to pdf processing? A 10% difference in can be massive for material properties.

McMaco

Hello tjohnson818,

Welcome to Google Cloud Community!

A 10% difference in material properties can indeed be massive, depending on the application. Without specific details about your PDF processing method, the nature of the material properties, and the expected accuracy, it's difficult to provide a definitive answer. However, I can provide the best practices and limitations where we can address the challenges in PDF processing and material properties.

PDF best practices:

PDFs are treated as images, so a single page of a PDF is treated as one image.
If your prompt contains a single PDF, place the PDF before the text prompt.
Use PDFs created with text rendered as text instead of using text in scanned images. This format ensures text is machine-readable so that it's easier for the model to edit, search, and manipulate compared to scanned image PDFs. This practice provides optimal results when working with text-heavy documents like contracts.

Limitations:

Spatial reasoning: The models aren't precise at locating text or objects in PDFs. They might only return the approximated counts of objects.
Accuracy: The models might hallucinate when interpreting handwritten text in PDF documents.

You may also check document understanding for additional resources and detailed documentation for the best results.

I hope the above information is helpful.