Hi all,
We're testing Document AI Invoice Parser processor for parsing .pdf invoices, and one thing I find confusing is that sometimes the "Train" tab has way more parsed data elements than the "Evaluate & Test" tab, given the same exact invoice.
For example, if I open the invoice in the "Train" tab, I can see the `vendor_name`, `line_item_description` etc, while in the "Evaluate & Test" tab I can't see those data points parsed.
We would integrate with Document AI via API, so I'm wondering what results we can expect - the richer ones from the Train tab, or the more rudimentary ones from the Evaluate & Test tab. Also would like to understand why there's a difference between the two.
Thank you!
Hi @wissil,
Welcome back to Google Cloud Community!
It is possible that you don't have enough detailed and labeled examples in your training data sets and test data sets in the "Train" tab, so the Document AI or the model did not understand the document when you are uploading a test document in the "Evaluation & Test" tab. Please note that the training data will be used to define the model which will be used for processing while the test data will be used to evaluate how well the processor performed. You can check the accuracy of the trained model in the Evaluation & Test tab and look for the metrics F1 score, Precision and Recall, the values will be low if the trained model wasn't trained enough or inaccurate but if the evaluation is acceptable, the metrics will be high, after that you will create or deploy a version of that model and request a prediction from it, this means that the results that you will expect will be from "Evaluation & test" tab. To get more accurate results, you need to provide at least 50 documents in test data and 50 documents in training data with 50 instances of each label, and make sure that you have correctly defined the schema, in this way, the trained model will be able to understand the document that you are trying to request. You can check this link for more information: https://cloud.google.com/document-ai/docs/workbench/uptrain-processor#import-prelabeled-data
Hope this helps!
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |