Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Automatic training of model based on users correction on DMS

Hello,

I want to use DocumentAI to extract the metadata of invoices stored in a document management system. 

Invoices will be then checked by human. Sometimes, the extraction will be wrong about some information for the invoices of one specific provider. In that case, the user will correct the wrong metadata in the DMS. 

My question is: can I use the manual corrections made on the DMS to automatically uptrain the models? Can the edit of metadata in the DMS be retrieved by DocAI?

That provider will probably send hundreds more invoice in the future: I want the correction made on the first invoice in the DMS to be retrieved by DocAI so that next invoices are correctly analysed.

Thank for your help

0 2 189
2 REPLIES 2

wkt_1
New Member

Yes, you can use the manual corrections made on the Document Management System (DMS) to improve your DocumentAI model, but this process involves a few steps and considerations:

  1. Integration for Feedback Loop: You’ll need to set up a feedback loop where manual corrections in the DMS can be retrieved and used to retrain your DocumentAI model. This typically involves developing an integration between your DMS and DocumentAI.

  2. Retrieving Metadata Corrections: Ensure that the DMS can export or make accessible the corrected metadata for each invoice. You may need to create a mechanism to extract these corrections in a structured format that DocumentAI can use.

  3. Retraining the Model: DocumentAI itself doesn’t automatically uptrain its models with corrections made in the DMS. You will need to collect the corrected data, prepare it as a training dataset, and then retrain your DocumentAI model using this dataset. This may involve using DocumentAI’s training tools or APIs to upload the updated data and retrain the model.

  4. Automating Retraining: To streamline the process, you can automate the collection of corrected metadata and the retraining process. This may involve scripting or using APIs to periodically update your training dataset with new corrections and retrain the model.

  5. Continuous Improvement: Implement a system where new invoices are periodically reviewed and corrections are added to your training dataset. This ensures that your model continuously improves as it learns from new data.

In summary, while DocumentAI doesn’t automatically learn from corrections made in the DMS, you can set up a process to retrieve these corrections and use them to retrain your model. This will help in improving the accuracy of metadata extraction for future invoices from the same provider.

Hi @Mo2792,

Welcome to Google Cloud Community!

It's fantastic that you're considering a workflow where manual corrections in your DMS can directly improve DocAI's performance. While Document AI  offers a lot of capabilities, it doesn't have a built-in feature to directly "retrieve" edits made in a separate DMS system as of the moment. 

Here’s why direct integration might be challenging:

  • Data Silos: DocumentAI operates within Google Cloud, while your DMS is likely a separate system without a native connection.
  • Security and Privacy: Allowing DocumentAI direct access to your DMS could raise significant security and privacy issues.
  • Data Format Mismatch: The metadata structure in your DMS may not align with the format DocumentAI requires for retraining.

Here are some alternative approaches and potential solutions to consider:

1. Manual Retraining:

  • Export and Import: Export corrected metadata from your DMS (e.g., as CSV) and upload it to Google Cloud Storage. Use this data to manually retrain your DocumentAI models.
  • API Integration: If your DMS provides an API, develop a custom app to extract, format, and feed metadata into DocumentAI for retraining. This offers more control but is complex.

2. Human-in-the-Loop Approach:

  • Annotation Tool: Use an integrated annotation tool (like Vertex AI Vision) for direct corrections on invoices. These corrections can help retrain the model with finer control.
  • Feedback Mechanism: Implement a system in your Document Management System (DMS) for users to report errors in extracted metadata. Aggregate these reports and manually retrain DocumentAI as a simpler feedback gathering method.

Keep in mind that, even with these workarounds, it’s important to consider data security and the potential effort required to manage the retraining process.

I hope the above information is helpful.