Text classification on-premise

neznajut · 08-21-2023 12:33 PM

Hello,

We are seeking a Document AI on-premise solution to classify and extract information from unstructured text, specifically in the Czech and Slovak languages. We've come across the ocr-service-cpu on the marketplace. Does it include a classification feature? Alternatively, does Google have other on-premise solutions for handling unstructured text in these languages?

Our typical document processing scenario is: OCR → Classification → Extraction.

Thank you!

kvandres

Hi @neznajut,

Based on the documentation, Vision OCR on prem is not capable with Document AI classification features such as specialized classifier, custom classifier and entity extraction since it is only compatible with the Vision API and its client libraries [1]. However, it is important to note that Vision API supports running dense document text detection [2]. So, if you want to extract the texts from a document and detect its language you will be able to do this using Vision OCR on prem [3] but it is not capable of the Document AI classification and entity extraction features.

[1] https://cloud.google.com/vision/on-prem
[2] https://cloud.google.com/vision/docs/fulltext-annotations
[3] https://cloud.google.com/vision/docs/pdf