Upload PDF from Local to Document AI

Hello everyone. I now create a web app and I want to ask about how to upload PDFs from my laptop/local to Document AI?

 

with open(file_path, "rb") as image:
       image_content = image.read()
 
uploaded_file = st.file_uploader('Choose your .pdf file', type="pdf")
 
process_document_sample(
     project_id="XXXX",
     location="us",
     processor_id="XXX",
     file_path=uploaded_file,
     mime_type="application/pdf"
)
I want upload PDF from local/my laptop use Streamlit (uploaded_file) and can read my PDF with with open(file_path, "rb") as image function?
0 1 868
1 REPLY 1

Hi @budionosan

Welcome and thank you for reaching out to our community.

I found your post in StackOverflow with exactly the same concern and was already answered (solved) by @holtskinner. Reposting the answer here for the community's visibility.

The Document AI API for online processing requests requires the input file to be encoded in base64 as a string, which the default Python File I/O does when exporting the bytes read.

For Streamlit, you'll need to get the bytes of the uploaded file and input that value directly in the API request, rather than passing it to

with open(file_path, "rb") as image:

In the Streamlit documentation, it looks like you are able to get the bytes data from an uploaded file. I'm not familiar with this framework, but you should be able to do something like this, using the code sample from Send a processing request.

from typing import Optional

from google.api_core.client_options import ClientOptions
from google.cloud import documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_PROCESSOR_LOCATION" # Format is "us" or "eu"
# processor_id = "YOUR_PROCESSOR_ID" # Create processor before running sample
# mime_type = "application/pdf" # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types
# field_mask = "text,entities,pages.pageNumber"  # Optional. The fields to return in the Document object.
# processor_version_id = "YOUR_PROCESSOR_VERSION_ID" # Optional. Processor version to use


def process_document_sample(
    project_id: str,
    location: str,
    processor_id: str,
    mime_type: str,
    field_mask: Optional[str] = None,
    processor_version_id: Optional[str] = None,
) -> None:
    # You must set the `api_endpoint` if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")

    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    if processor_version_id:
        # The full resource name of the processor version, e.g.:
        # `projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}`
        name = client.processor_version_path(
            project_id, location, processor_id, processor_version_id
        )
    else: