Hello, we started exploring Doc AI by trying multiple processors under several projects. We are now consolidating them into dev, test and production projects, each with a version of the processors.
Our goal is to first train the processors in the test project and then copy/move them into the production project.
How can we copy over processors from one project to another? We are unable to find any documentation that would help us accomplish this.
Good day @sivramk,
Welcome to Google Cloud Community!
You can try importing a processor version using Document AI Workbench to a different project. The destination project is where you will import the processor version while the source project is where the processor version resides. There are multiple requirements for this, you can check the following:
1. Processor types and schemas must match.
2. Destination processer must be enabled and it can have existing versions/datasets.
3. The version of the source processor must be in the following states: Deployed, Deploying, Undeployed and Undeploying.
In order to avoid permission errors, you must add a Document AI Editor permission to DocumentAI Service Agent.
You can check this documentation to learn more: https://cloud.google.com/document-ai/docs/manage-processor-versions#import
Hope this helps!
Hi! I am trying to move one whole custom processor from dev project into production, but the only option I currently see available is to create a new custom project in prod environment, copy the full schema manually and then finally import the processor version from dev project.
Taking into account that our schemas are pretty large and that this would imply a considerable task of re-working, isn't there any way to directly move a processor from one project to another and avoid all that re-working?
Thank you!
Hi,
You can copy or move processors from one project to another in Doc AI, you can use the Google Cloud Console or the Cloud SDK command-line tools. To copy processors from one project to another in Doc AI, export the processor from the source project using gcloud doc-ai processors export, and then import it into the target project using gcloud doc-ai processors import. This is how we work at Triotech Systems.
Hi! I am trying to move one whole custom processor from dev project into production, but the only option I currently see available is to create a new custom project in prod environment, copy the full schema manually and then finally import the processor version from dev project.
Taking into account that out schemas are pretty large and that this would imply a considerable task of re-working, does your solution mean that we can directly move a processor from one project to another without having to copy all the schema again?
Thank you!
Thank you for reaching out. Fortunately, the process I mentioned can indeed help you move a custom processor from one project to another without manually copying the full schema.
When using the gcloud doc-ai processors export command, it exports not only the processor configuration but also the associated schema. This means you won't have to recreate the entire schema manually in the production project. The export command packages everything you need into a format that can be easily imported into another project.
Here's a step-by-step guide to clarify:
Export the processor from the development project:
gcloud doc-ai processors export PROCESSOR_ID --location=LOCATION --project=SOURCE_PROJECT > processor_export.json
Import the processor into the production project:
gcloud doc-ai processors import processor_export.json --location=LOCATION --project=TARGET_PROJECT
This way, you can move your custom processor, including its schema, from the development project to the production project. It should save you from the manual effort of recreating the entire schema. Thnks!
Hi @Expertopinionsa
Thanks a lot for the response. I will try to follow the steps you defined.
I am afraid that I get a "invalid choice" answer:
ERROR: (gcloud) Invalid choice: 'doc-ai'.
Maybe you meant:
gcloud ai custom-jobs
gcloud ai endpoints
gcloud ai hp-tuning-jobs
gcloud ai index-endpoints
gcloud ai indexes
gcloud ai model-monitoring-jobs
gcloud ai models
gcloud ai operations
gcloud ai tensorboards
do I need to install a gcloud component ?
Where is not such command "gcloud doc-ai".
Is this a made up thing or is there still a working gcloud command?
Hello,
I am trying to achieve the same thing where I have a large Custom Document Extractor and a Custom Document Processor in development environment. I want to migrate/clone both the processors and schemas without recreating the same.
I want to know if I can migrate these processors between dev/test/prod environments under same project and also if I can migrate them between source and destination project.
There is no practical implementation or document guide to achieve this activity.
There are several steps involved in the migration of a DocumentAI processor from a source to destination. I will try to highlight the ones that are relevant to the question here (migration of the processor and the schema). DocumentAI also allows you to migrate datasets (details here).
Document AI Processor Migration Steps:
#######################################################################################################################################################################
# SCRIPT: docai_migrate_schema.py
# LANGUAGE: Python
# PURPOSE: Export schema from a Google Document AI parser and import it into another parser either in the same project or a different one
# DISCLAIMER: This script is being provided as-is and is not managed or maintened by anyone. Please use it at your own discretion
#######################################################################################################################################################################
#### IMPORT LIBRARIES ####
from google.cloud import documentai_v1beta3
import json
import re
#### DEFINE VARIABLES ####
#### PLEASE CHANGE THE VALUES TO MATCH YOUR REQUIREMENTS ####
source_processor_id = "YOUR-SOURCE-PROCESSOR-ID" #Source Processor ID
destination_processor_id = "208fde00f1e8802a" #Destination Processor ID
source_project_id = "YOUR-SOURCE-PROJECT-ID" # Source Project Number - need service / user account with roles/documentai.viewer for accessing it (in the source project)
destination_project_id = "YOUR-DESTINATION-PROJECT_ID" # Destination Project Number - need service /user account with roles/documentai.editor to update (in the dest project)
#### CODE TO MIGRATE SCHEMA ####
client = documentai_v1beta3.DocumentServiceClient()
## Migrate only enabled fields. Disabled fields will not be carried over (visible_fields_only=true)
request = documentai_v1beta3.GetDatasetSchemaRequest(name=f"projects/{source_project_id}/locations/us/processors/{source_processor_id}/dataset/datasetSchema",visible_fields_only=True)
## Migrate ALL fields. All fields (enabled and disabled) will be carried over (visible_fields_only=false)
## Please comment the previous section and uncomment the line after this if you want to export and import ALL fields, enabled and disabled
#request = documentai_v1beta3.GetDatasetSchemaRequest(name=f"projects/{source_project_id}/locations/us/processors/{source_processor_id}/dataset/datasetSchema",visible_fields_only=False)
old_schema = client.get_dataset_schema(request=request)
old_schema.name = f"projects/{destination_project_id}/locations/us/processors/{destination_processor_id}/dataset/datasetSchema" # Destination Processor
# print(old_schema) # Print the Old Schema
### Update the target parsers schema with the one from the source parser
request = documentai_v1beta3.UpdateDatasetSchemaRequest(dataset_schema=old_schema)
# Make the request
response = client.update_dataset_schema(request=request)
print("Schema Updated")
Hope this helps.
Is there no longer a way to do this using the CLI? I have seen both "gcloud doc-ai ..." and "gcloud document-ai ..." referenced in various places, but both throw the "Invalid choice" error. Am I missing something?