Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Document AI import dataset in a different processor

Hello.

As I can see, Document AI processors interface had "Export data set" functionality. But I cannot find a way how to import this exported dataset to different processor. I found some info in this doc - https://cloud.google.com/document-ai/docs/label-documents?authuser=1#import-labels but link "Sending a processing request to an existing processor" broken and show 404.

Context: I need copy schema and already labled documents in custom processor from invoice processor.

Thanks  

Solved Solved
0 2 1,382
1 ACCEPTED SOLUTION

Hi @oleks_vasyliev,

Welcome to Google Cloud Community!

The link to "Sending a processing request to an existing processor" redirects to the 404 page on my end as well. But I found the correct link which you can access here. Although Document AI processors offer an "Export dataset" functionality, there isn't a straightforward way to import the entire dataset, including the schema and labeled documents, into a different processor.

However, you can consider a couple of workarounds:

1. Auto-labeling with a Foundation Model Processor:

  • Import your documents into the new processor and enable "Import with auto-labeling".
  • This will use a Pre-trained foundation model processor to automatically assign labels to your documents based on the existing schema.
  • Manually review and correct these auto-labeled documents before using them to train the new processor.
  • For more information, refer to the documentation: Custom extractor mechanisms 

2. Recreate the Schema and Manually Import Labeled Documents:

  • You can define a new schema in the target processor to closely match the schema of the original processor. Refer to Document AI Schemas for instructions on defining Schemas, including data types and field types. This will assist you in recreating the schema from your original processor in the new one.
  • Next, manually import the labeled documents from the original dataset into the new processor. To do this, you may follow the Label documents guide that explains how to use the labeling tool to manually label imported documents in your new processor.

I hope the above information is helpful.

 

 

View solution in original post

2 REPLIES 2