Hello everyone. I now create a web app and I want to ask about how to upload PDFs from my laptop/local to Document AI?
Hi @budionosan,
Welcome and thank you for reaching out to our community.
I found your post in StackOverflow with exactly the same concern and was already answered (solved) by @holtskinner. Reposting the answer here for the community's visibility.
The Document AI API for online processing requests requires the input file to be encoded in
base64
as a string, which the default Python File I/O does when exporting the bytes read.For Streamlit, you'll need to get the bytes of the uploaded file and input that value directly in the API request, rather than passing it to
with open(file_path, "rb") as image:
In the Streamlit documentation, it looks like you are able to get the bytes data from an uploaded file. I'm not familiar with this framework, but you should be able to do something like this, using the code sample from Send a processing request.
from typing import Optional from google.api_core.client_options import ClientOptions from google.cloud import documentai # TODO(developer): Uncomment these variables before running the sample. # project_id = "YOUR_PROJECT_ID" # location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu" # processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample # mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types # field_mask = "text,entities,pages.pageNumber" # Optional. The fields to return in the Document object. # processor_version_id = "YOUR_PROCESSOR_VERSION_ID" # Optional. Processor version to use def process_document_sample( project_id: str, location: str, processor_id: str, mime_type: str, field_mask: Optional[str] = None, processor_version_id: Optional[str] = None, ) -> None: # You must set the `api_endpoint` if you use a location other than "us". opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com") client = documentai.DocumentProcessorServiceClient(client_options=opts) if processor_version_id: # The full resource name of the processor version, e.g.: # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}` name = client.processor_version_path( project_id, location, processor_id, processor_version_id ) else:
User | Count |
---|---|
13 | |
1 | |
1 | |
1 | |
1 |