I have pdf documents containing invoices . A single pdf can have multiple invoices in it. Invoices can take one or more pages in a pdf. I want to split/group each invoice based on page number given. Page number is written in each pdf(for e.g: page 1 of 2,page 2 of 2) which can be used in annotation/training of model
Example:
Input: A pdf of 5 pages. Page 1 contains invoice 1. Page 2 contains invoice 2. Page(3-5) contains invoice 3.
Output: 3 different pdfs for each invoice.
(Note: I can not separate them based on invoice number because it can be missing in some invoices)
sample pdf
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |