Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Any way to provide document in API request to Vertex AI?

Our goal is to provide a picture of a document (pdf) and run some prompts to compare data on the document against the saved data in our system. The goal of this is to speed up our manual verification process and help point out missing or potentially incorrect data from the document.

This is our first step into introducing AI into our system. Everyone on our team is brand new to AI, so we are researching/testing multiple products.

Using any of the Google Vision products (Vision, Document AI, Vertex AI), is there a way to provide a base64 encoded document with some prompts and receive back a response? We have been able to use the Vertex UI for some initial testing, but now we would like to start testing using a Java application

0 3 2,146
3 REPLIES 3

Hi Jdonn95,

Welcome to Google Cloud Community!

For your specific goal which is to provide a picture of a document (pdf) and run some prompts to compare data on the document against the saved data in our system, both Document AI and Vertex AI can be suitable options, but with some key differences:

Document AI: Easier to set up for structured data extraction, especially with pre-built processors. Might be enough for simpler comparisons.

Vertex AI: Offers more flexibility for complex comparisons and custom logic. Requires more development effort initially.

Considering Your Team's Expertise:

  • If your documents have a consistent format and data points are well-defined, Document AI might be a good starting point due to its ease of use.
  • If you need to handle complex layouts, custom comparisons with data integration, or have developers comfortable with AI, Vertex AI could be a future option.

Here's how Document AI can work for data comparison in Java:

  1. Create and manage processors: Configure Document AI processors (pre-built or custom) to extract specific data points from the PDF based on your prompts (e.g., "customer name," "invoice number").
  2. Java Client Library: Use the Google Cloud Document AI client library for Java to interact with Document AI.
  3. Base64 Encoding: Encode your PDF document in Base64 format and send it along with the configured processors in the request.
  4. Extracted Data: Document AI extracts data points based on your defined types.
  5. Data Comparison: Compare the extracted data with your saved data in your system to identify inconsistencies or missing information.

Document AI Example with Java:

Here's a simplified Java code snippet demonstrating a basic Document AI request (replace placeholders with your details):

import com.google.cloud.documentai.v1.*;

public class DocumentVerification {

    public static void main(String[] args) throws Exception {
        // Replace with your project ID
        String projectId = "your-project-id";

        // Replace with the location of your Document AI processor
        String location = "your-processor-location";

        // Replace with your Base64 encoded PDF content
        String base64EncodedPdf = "your-base64-encoded-pdf";

        // Replace with your processor name (e.g., pre-built invoice processor)
        String processorName = "your-processor-name";

        DocumentAiServiceClient client = DocumentAiServiceClient.create();

        // Prepare the request
        ProcessRequest request = ProcessRequest.newBuilder()
                .setName(processorName)

                .setInputConfig(InputConfig.newBuilder().setRawDocument(RawDocument.newBuilder().setContent(base64EncodedPdf).build()).build())
                .build();

        // Send request and get response
        ProcessResponse response = client.processDocument(location, request);

        // Extract data from response (needs further processing)
        for (Document document : response.getDocumentsList()) {
            for (Page page : document.getPagesList()) {
                for (Layout layout : page.getLayoutsList()) {

                    // Access extracted data based on processor configuration
                    // ...
                }
            }
        }
    }
}

Here are All Document AI code samples that might be helpful to establish your code.

Moving to Vertex AI in the Future:
If you later decide to explore Vertex AI for more complex comparisons, here's a general outline:

  1. Data Preprocessing: Extract data from the document using Document AI or a custom parsing tool. Pre-process both extracted data and saved data for use in a Vertex AI model.
  2. Train a Model: Train a Vertex AI model (e.g., AutoML Tabular) with your existing data to identify potential data discrepancies. You might need to explore Vertex AI resources and tutorials to get started.
  3. Java Client Library: Use the Vertex AI client library for Java to interact with Vertex AI models.
  4. Model Prediction: Feed the pre-processed document data and saved data into your trained model.
  5. Highlight Discrepancies: Based on the model's predictions, identify and highlight potential inconsistencies or missing information.

Here also are All Vertex AI code samples that might be helpful to establish your code.

Important Notes:

  • Remember to set up your Google Cloud project and enable the necessary APIs (Document AI or Vertex AI depending on your chosen approach).
  • Consider starting with a small set of documents for testing and refine your approach based on the results.
  • Explore additional resources and tutorials provided by Google Cloud for Document AI and Vertex AI to get a deeper understanding of each option.

Hope this helps.

Yes, you can provide a document in an API request to Vertex AI by sending the document data as part of the request payload. Depending on the specific API and use case, you may need to format the document data accordingly (e.g., JSON, binary format) and include it in the request body along with any other required parameters or headers.

For example, if you're using Vertex AI's Natural Language API to analyze text documents, you would typically send a POST request with the document content included in the request body. The exact details will vary based on the specific API endpoint and programming language/framework you are using.

 

To summarize the above replies:

1. Use Document AI API (rest or through Java client toolkit) to process documents and extract text. You can try OCR or Layout Parser. If you have forms and tables in the document, Form parser may do the job.

2. Store extracted text in DB. If you need search functionality, store in vector DB (for semantic search) or Solr/Elastic Search (for term search).

3. In your application, retrieve the text you wish to extract data from, construct Gen AI prompt and process it using Gemini APIs.

You can follow it with other tricks, generate schema from the response, refine schema and then reuse it in the prompt, etc.