PDF Cleaning + PDF Splitting using Document AI

I am trying to help a small courier service company that needs help in automating their physical sorting during the folding for mailing process.

I have a large PDF that contains bills and statements with the names and addresses of people that are printed and mailed. The data needs to be cleaned to limit the physical sorting during the folding for mailing process. Some bills have two or more pages associated with a single recipient and the additional pages do not carry the addresses. Some bills are missing addresses. Can Document AI help here if so which of the variants - basic or Document workbench or Document Warehouse

The role of AI here is to group multiple pages that belong to a single mailing address / recipient. 

Input :- A single large PDF with names and addresses of people along with their statements - mobile bill / credit card statement etc

Output:- N number of PDFs split by the name and addresses including the statements associated with that person only (can be single page or multiple pages)

Solved Solved
2 4 327
1 ACCEPTED SOLUTION

Document AI can indeed assist in automating the sorting and processing of documents like bills and statements. Document AI offers several variants, including Document Workbench and Document Warehouse, which can be suitable for different needs.

In your case, where you need to extract data from bills and statements, clean it, and then split the documents based on names and addresses, Document AI's capabilities can be leveraged effectively.

Document Warehouse is deprecated and will no longer be available after January 16, 2025.

Document Workbench is particularly useful when you have specific document types and need to extract structured data from them. It allows you to build custom document processing solutions tailored to your specific requirements.

Here's a high-level approach you could take:

  • Use Document Workbench to create a document processing pipeline tailored to your specific document types (bills and statements). Train the model to recognize key fields like names and addresses.
  • Preprocess the PDF to extract text and other relevant data using Document AI's OCR capabilities.
  • Clean and normalize the extracted data to ensure consistency and accuracy. This may involve handling cases where addresses are missing or where multiple pages belong to a single recipient.
  • Group the pages belonging to the same recipient based on the extracted names and addresses.
  • Generate separate PDFs for each recipient, including all associated statements and bills.

By leveraging Document AI's capabilities, you can automate and streamline the sorting and processing of your documents, saving time and effort for your courier service company.

View solution in original post

4 REPLIES 4

Roderick
Community Manager
Community Manager

Interesting use case here @dheerajpanyam - looking forward to seeing what our colleagues come up with!

Document AI can indeed assist in automating the sorting and processing of documents like bills and statements. Document AI offers several variants, including Document Workbench and Document Warehouse, which can be suitable for different needs.

In your case, where you need to extract data from bills and statements, clean it, and then split the documents based on names and addresses, Document AI's capabilities can be leveraged effectively.

Document Warehouse is deprecated and will no longer be available after January 16, 2025.

Document Workbench is particularly useful when you have specific document types and need to extract structured data from them. It allows you to build custom document processing solutions tailored to your specific requirements.

Here's a high-level approach you could take:

  • Use Document Workbench to create a document processing pipeline tailored to your specific document types (bills and statements). Train the model to recognize key fields like names and addresses.
  • Preprocess the PDF to extract text and other relevant data using Document AI's OCR capabilities.
  • Clean and normalize the extracted data to ensure consistency and accuracy. This may involve handling cases where addresses are missing or where multiple pages belong to a single recipient.
  • Group the pages belonging to the same recipient based on the extracted names and addresses.
  • Generate separate PDFs for each recipient, including all associated statements and bills.

By leveraging Document AI's capabilities, you can automate and streamline the sorting and processing of your documents, saving time and effort for your courier service company.

Thank you so much @Poala_Tenorio 🙏 for the detailed reply. I will definitely work on the solution and try to implement it. 

@Poala_Tenorio Sorry to bother you again. Can Doc AI workbench handle the grouping by virtue of its AI capabilities or do I need to write custom logic?