Re: Checkbox extraction using Google Vertex LLM

PravinShelake15 · 01-22-2025 05:54 AM

Hi Team,

My use case: I want to extract handwritten content from pdf along with some checkbox data using Google Vertex LLM API.

I tried changing prompts and version of gemini models. But it is not giving accurate results. How we can get accuracy in check box extraction. Even for wrong checkbox extraction it is giving 1.0 confidence.

Kindly help.

Regards

ruthseki

Hi @PravinShelake15,

Welcome to Google Cloud Community!

One of Google Vertex AI's Gemini’s use case is for text understanding and generation, not direct image analysis. The high confidence score despite incorrect results indicates a limitation of the model: it's confident in its prediction, even if that prediction is wrong. The model isn't truly understanding the visual data; it's making educated guesses based on patterns it's learned.

Instead of using Google Vertex AI's Gemini, I suggest trying Document AI.

Document AI is specifically designed for extracting information from documents, including handwritten text and form fields like checkboxes. It leverages advanced computer vision and machine learning models optimized for this task. It also includes powerful OCR capabilities, making the text extraction process more accurate.

Here’s how to use Document AI for your use case:

Choose the Right Processor: Google Document AI offers different processors. Feel free to explore this list of processor. Also, you may check this page as a reference related to extraction.
Prepare Your PDFs: Ensure your PDFs are properly formatted. High-resolution scans generally yield better results.
Send to the API: Use the Document AI API to send your PDF files for processing. The API will return structured data containing extracted text and checkbox values. You'll need to integrate the API into your application (Python is a common choice). The API typically returns JSON, which is easily parsed.
Process the Response: The API response will contain the extracted information in a structured format. Parse the JSON response to extract the data you need (text and checkbox statuses).
Error Handling: Implement proper error handling to manage situations where the API fails to extract information accurately.

Here are some Google Cloud Document AI documentation for detailed instructions on setting up your project, creating a processor, and using the API:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.