Our goal is to provide a picture of a document (pdf) and run some prompts to compare data on the document against the saved data in our system. The goal of this is to speed up our manual verification process and help point out missing or potentially incorrect data from the document.
This is our first step into introducing AI into our system. Everyone on our team is brand new to AI, so we are researching/testing multiple products.
Using any of the Google Vision products (Vision, Document AI, Vertex AI), is there a way to provide a base64 encoded document with some prompts and receive back a response? We have been able to use the Vertex UI for some initial testing, but now we would like to start testing using a Java application
Hi Jdonn95,
Welcome to Google Cloud Community!
For your specific goal which is to provide a picture of a document (pdf) and run some prompts to compare data on the document against the saved data in our system, both Document AI and Vertex AI can be suitable options, but with some key differences:
Document AI: Easier to set up for structured data extraction, especially with pre-built processors. Might be enough for simpler comparisons.
Vertex AI: Offers more flexibility for complex comparisons and custom logic. Requires more development effort initially.
Considering Your Team's Expertise:
Here's how Document AI can work for data comparison in Java:
Document AI Example with Java:
Here's a simplified Java code snippet demonstrating a basic Document AI request (replace placeholders with your details):
import com.google.cloud.documentai.v1.*;
public class DocumentVerification {
public static void main(String[] args) throws Exception {
// Replace with your project ID
String projectId = "your-project-id";
// Replace with the location of your Document AI processor
String location = "your-processor-location";
// Replace with your Base64 encoded PDF content
String base64EncodedPdf = "your-base64-encoded-pdf";
// Replace with your processor name (e.g., pre-built invoice processor)
String processorName = "your-processor-name";
DocumentAiServiceClient client = DocumentAiServiceClient.create();
// Prepare the request
ProcessRequest request = ProcessRequest.newBuilder()
.setName(processorName)
.setInputConfig(InputConfig.newBuilder().setRawDocument(RawDocument.newBuilder().setContent(base64EncodedPdf).build()).build())
.build();
// Send request and get response
ProcessResponse response = client.processDocument(location, request);
// Extract data from response (needs further processing)
for (Document document : response.getDocumentsList()) {
for (Page page : document.getPagesList()) {
for (Layout layout : page.getLayoutsList()) {
// Access extracted data based on processor configuration
// ...
}
}
}
}
}
Here are All Document AI code samples that might be helpful to establish your code.
Moving to Vertex AI in the Future:
If you later decide to explore Vertex AI for more complex comparisons, here's a general outline:
Here also are All Vertex AI code samples that might be helpful to establish your code.
Important Notes:
Hope this helps.
Yes, you can provide a document in an API request to Vertex AI by sending the document data as part of the request payload. Depending on the specific API and use case, you may need to format the document data accordingly (e.g., JSON, binary format) and include it in the request body along with any other required parameters or headers.
For example, if you're using Vertex AI's Natural Language API to analyze text documents, you would typically send a POST request with the document content included in the request body. The exact details will vary based on the specific API endpoint and programming language/framework you are using.
To summarize the above replies:
1. Use Document AI API (rest or through Java client toolkit) to process documents and extract text. You can try OCR or Layout Parser. If you have forms and tables in the document, Form parser may do the job.
2. Store extracted text in DB. If you need search functionality, store in vector DB (for semantic search) or Solr/Elastic Search (for term search).
3. In your application, retrieve the text you wish to extract data from, construct Gen AI prompt and process it using Gemini APIs.
You can follow it with other tricks, generate schema from the response, refine schema and then reuse it in the prompt, etc.