Hey all - I've got a Jupyter Notebook where I'm loading an image into memory from a subfolder as well as a single page PDF. The image is the PDF document extracted to an image, and my goal is to use computer vision to process the text within the PDF document because it has complex and unknown layouts to the point where if I just tried to process the text, the context would be lost and it would be not be possible. So I'm trying to send the PDF document so the model can use it to better extract information from the document using computer vision.
The problem I've run into is when I try to send both the image and the PDF, I get this error:
TypeError: Unexpected item type:
And then the array that contains the prompt, image and PDF are printed to console. Here is my code:
images = convert_pdf_to_images(pdf_path, output_folder, pdf_name)
for image_path in images:
print_magenta(f"Opening {image_path}")
with open(pdf_doc, "rb") as pdf, open(image_path, "rb") as img:
pdf_content = base64.b64encode(pdf.read()).decode("utf-8")
image_content = base64.b64encode(img.read()).decode("utf-8")
page = page + 1
content = [
{"type": "text", "text": question},
{"type": "media", "mime_type": "application/pdf", "data": pdf_content},
{"type": "media", "mime_type": "image/png", "image_url": f"data:image/png;base64,{image_content}"}
]
response = model.generate_content([content], generation_config=generation_config)
Any help anyone could give me would be tremendously appreciated, as I'm currently stuck with this issue and not sure how to proceed.