Hello.
As I can see, Document AI processors interface had "Export data set" functionality. But I cannot find a way how to import this exported dataset to different processor. I found some info in this doc - https://cloud.google.com/document-ai/docs/label-documents?authuser=1#import-labels but link "Sending a processing request to an existing processor" broken and show 404.
Context: I need copy schema and already labled documents in custom processor from invoice processor.
Thanks
Solved! Go to Solution.
Hi @oleks_vasyliev,
Welcome to Google Cloud Community!
The link to "Sending a processing request to an existing processor" redirects to the 404 page on my end as well. But I found the correct link which you can access here. Although Document AI processors offer an "Export dataset" functionality, there isn't a straightforward way to import the entire dataset, including the schema and labeled documents, into a different processor.
However, you can consider a couple of workarounds:
1. Auto-labeling with a Foundation Model Processor:
2. Recreate the Schema and Manually Import Labeled Documents:
I hope the above information is helpful.
Hi @oleks_vasyliev,
Welcome to Google Cloud Community!
The link to "Sending a processing request to an existing processor" redirects to the 404 page on my end as well. But I found the correct link which you can access here. Although Document AI processors offer an "Export dataset" functionality, there isn't a straightforward way to import the entire dataset, including the schema and labeled documents, into a different processor.
However, you can consider a couple of workarounds:
1. Auto-labeling with a Foundation Model Processor:
2. Recreate the Schema and Manually Import Labeled Documents:
I hope the above information is helpful.
I encountered the same issue and found a way to import labeled datasets into a custom Document AI processor. Here’s the documentation: https://cloud.google.com/document-ai/docs/create-dataset#import
Here is the important section :
When you select Import, Document AI imports all of the supported file types as well as JSON Document files into the dataset. For JSON Document files, Document AI imports the document and converts its entities into label instances.
In summary, if you select the folder where it was exported in your bucket, it will apply the same label in the documents. The field names must be the same between the original to the new dataset. I had to do two different import, one for test and the other for training.