How To - Invoice Parser - PDF

Greetings,

Very new to working with unstructured data.

Situation: Collection of PDF Invoices on some Google Drive

Required: Parse out certain data from the PDF files and land the data into Google BigQuery

Q1: How would I setup Document AI to parse the below data ( see highlighted green items )

Q2: Is there any special setup for the line item section ( PurchaseItem1,2,3, etc..)

2023-09-15 15_34_24-Test Invoice - Google Sheets.png

Thank you for your patience and understanding 👍

Solved Solved
0 4 1,067
1 ACCEPTED SOLUTION

You can follow these initial steps in codelabs to get you set up with Document AI's Invoice Parser. Make sure to enable your Document AI API and create the right processor. For your use case, the Invoice Parser.

Once you're set up, you can train your processor and set up a pipeline to extract data from your PDF invoices and to store it in BigQuery.

View solution in original post

4 REPLIES 4

You can follow these initial steps in codelabs to get you set up with Document AI's Invoice Parser. Make sure to enable your Document AI API and create the right processor. For your use case, the Invoice Parser.

Once you're set up, you can train your processor and set up a pipeline to extract data from your PDF invoices and to store it in BigQuery.

Thank you Poala. This helps to ,point me in the right direction.

Question 2: if there is a change to the structure of the invoice, is there any documentation on how to setup Document AI to adjust accordingly?

 

Are we talking about writing a custom parser here or a predefined one?

Hi dheerajpanyam,

I am referring to a predefined one ( I.e Invoice Parser ).  However, if you have insights about the custom parser, please share your thoughts.  Thank you.