Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Document AI

We plan to use document AI and extract information from PDF documents. These PDF documents are stored hyland -onbase installed on GCP.Do we need to move documents to google cloud storage for processing using Document AI on GCP?

0 2 1,157
2 REPLIES 2

Hi there a have similiar use case, I have docs with 30 ~ 100 pages, docs with more than 15 pages needs a batch processing. In my case I upload this files to firestore, sends the gs:// path to Doc AI, in Doc AI I have the output in another firestore folder wich I call batch, once the batch is finished I send this files to Search & Conversation, when S&C finish the processing I clean that batch folder.

All this is made on 3 different cloud functions:

- function 1: upload pdf fo firestore
- function 2: processes Doc AI
- function 3: sent Doc AI output to S&C

I tried quite a few ways to do that:

- a single cloud function v1 ... timeout problem
- a single cloud function v2 ... got some issues moving files from bucktes, It stops processing and doesn´t output any error

I thought breaking in 3 services would be better to monitor some possible error and if occurs, reprocessing from the point of failure.

In general it is a good idea. You can process small PDF files (< 15 pages) in memory using API. You would use batch processing for larger files. They should be present on the gcs bucket. Link to input parameters for batch processing:

https://cloud.google.com/document-ai/docs/reference/rest/v1/BatchDocumentsInputConfig