Hello,
I’m trying to extract text and checkboxes from a handwritten survey in PDF format. The survey consists of 4 series, each with 100 sets.
For the first series, the accuracy is around 80%, with only a few checkboxes incorrectly detected or missed.
However, for the remaining series, most checkboxes aren’t detected at all. When I label the checkboxes, they’re supposed to be highlighted in blue, but they aren’t.
Initially, I thought the document might not be clear enough, but upon comparing the good and bad checkboxes, they seem equally clear.
What should I do?
Thanks.
Solved! Go to Solution.
Hi @KT-K,
The lack of a gray label suggests that your template's checkbox definitions are likely not accurate. The model can't properly align your annotations with the actual checkbox elements in the document.
Moreover, adding more samples without the gray label won't help the model understand what a checkbox is. It will simply learn to treat those regions as empty or undefined. The key is to ensure your existing samples have correctly labeled checkboxes with the gray label. This teaches the model what a checkbox looks like.
Here are the factors Influencing checkbox detection:
1. Template Accuracy:
2. Training Data Quality:
3. Document Structure:
4. Model Training:
By focusing on template accuracy, data quality, and proper labeling, you'll significantly improve your checkbox detection within your Document AI model.
I hope this clarifies your concern.
Hi @KT-K,
Welcome to Google Cloud Community!
There's an opportunity to fine-tune your Document AI custom template-based model to achieve even better checkbox detection precision for your handwritten surveys. To optimize performance, here are some strategies that could enhance accuracy:
1. Leverage the Newer Foundation Model:
2. Refine Training Data:
3. Address Labeling Issues:
4. Consider Alternative Approaches:
By trying these steps, you should be able to improve the checkbox detection accuracy in your Document AI model for your handwritten surveys.
I hope the above information is helpful.
Hello @ruthseki ,
Thank you for your reply.
Since the foundational model and form parser are unable to detect my checkboxes, I am using the Template-Based model.
Moreover, I believe the primary reason is that most checkboxes do not display a grey label after I click "select Text" during labeling.
If I import more samples without grey labels, would that improve the results?
Additionally, what factors influence checkbox detection?
Thank you.
A
Hi @KT-K,
The lack of a gray label suggests that your template's checkbox definitions are likely not accurate. The model can't properly align your annotations with the actual checkbox elements in the document.
Moreover, adding more samples without the gray label won't help the model understand what a checkbox is. It will simply learn to treat those regions as empty or undefined. The key is to ensure your existing samples have correctly labeled checkboxes with the gray label. This teaches the model what a checkbox looks like.
Here are the factors Influencing checkbox detection:
1. Template Accuracy:
2. Training Data Quality:
3. Document Structure:
4. Model Training:
By focusing on template accuracy, data quality, and proper labeling, you'll significantly improve your checkbox detection within your Document AI model.
I hope this clarifies your concern.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |