Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Function of Labels

With Document AI, is the algorithm learning the location of the labels or the structure? I have a few documents and the content/fields are the same in the document, but sometimes the order is changed so would a Custom Document Extractor be applicable to this situation? Is the model being taught to look for a label in a specific location/position, thus making it bad at handling situations where the order or positioning of some field is changed? Also is there any Documentation for how the model actually works?

1 1 458
1 REPLY 1

Good day @sr404,

Welcome to Google Cloud Community!

Custom Document Extractor should still be suitable for this scenario, given that you have provided the different documents that are changing and you have provided enough labeled datasets required to train or uptrain a processor version. It should be able to identify the different elements in your document, but it is important that you have labeled your documents correctly, otherwise it may not extract the correct information from the document. It is required that you have provided at least 10 documents in training and test sets, in addition, in each set it must have 10 instances of each label but for better performance it is recommended that you have provided 50 documents in training and test sets with 50 instances of each label in each set, if you want it to be more accurate, you need to provide more training data. You can check this link for more information:   https://cloud.google.com/document-ai/docs/workbench/build-custom-processor#import_pre-labeled_data_t...
You can check this link if you want to know more about labeling documents: https://cloud.google.com/document-ai/docs/workbench/label-documents
When you are labeling your documents please note about the best practices for best results. You can check this link for more information: https://cloud.google.com/document-ai/docs/workbench/label-documents#labeling

Hope this helps!