Hello,
I want to use DocumentAI to extract the metadata of invoices stored in a document management system.
Invoices will be then checked by human. Sometimes, the extraction will be wrong about some information for the invoices of one specific provider. In that case, the user will correct the wrong metadata in the DMS.
My question is: can I use the manual corrections made on the DMS to automatically uptrain the models? Can the edit of metadata in the DMS be retrieved by DocAI?
That provider will probably send hundreds more invoice in the future: I want the correction made on the first invoice in the DMS to be retrieved by DocAI so that next invoices are correctly analysed.
Thank for your help
Yes, you can use the manual corrections made on the Document Management System (DMS) to improve your DocumentAI model, but this process involves a few steps and considerations:
Integration for Feedback Loop: You’ll need to set up a feedback loop where manual corrections in the DMS can be retrieved and used to retrain your DocumentAI model. This typically involves developing an integration between your DMS and DocumentAI.
Retrieving Metadata Corrections: Ensure that the DMS can export or make accessible the corrected metadata for each invoice. You may need to create a mechanism to extract these corrections in a structured format that DocumentAI can use.
Retraining the Model: DocumentAI itself doesn’t automatically uptrain its models with corrections made in the DMS. You will need to collect the corrected data, prepare it as a training dataset, and then retrain your DocumentAI model using this dataset. This may involve using DocumentAI’s training tools or APIs to upload the updated data and retrain the model.
Automating Retraining: To streamline the process, you can automate the collection of corrected metadata and the retraining process. This may involve scripting or using APIs to periodically update your training dataset with new corrections and retrain the model.
Continuous Improvement: Implement a system where new invoices are periodically reviewed and corrections are added to your training dataset. This ensures that your model continuously improves as it learns from new data.
In summary, while DocumentAI doesn’t automatically learn from corrections made in the DMS, you can set up a process to retrieve these corrections and use them to retrain your model. This will help in improving the accuracy of metadata extraction for future invoices from the same provider.
Hi @Mo2792,
Welcome to Google Cloud Community!
It's fantastic that you're considering a workflow where manual corrections in your DMS can directly improve DocAI's performance. While Document AI offers a lot of capabilities, it doesn't have a built-in feature to directly "retrieve" edits made in a separate DMS system as of the moment.
Here’s why direct integration might be challenging:
Here are some alternative approaches and potential solutions to consider:
1. Manual Retraining:
2. Human-in-the-Loop Approach:
Keep in mind that, even with these workarounds, it’s important to consider data security and the potential effort required to manage the retraining process.
I hope the above information is helpful.