How to control the segmentation and extraction of structured data using Document AI API OCR

I am using Document AI API OCR, what I am trying to do is extract the text of the document in a formatted manner so that using the output I can use regex to get the result.

For example if a document has the fields like Registration Number: 12345, Name: XYZ both on seperate lines, I wanted to get the output in two lines.

But when I ask for the API to return the text "Registration Number" is on 1st line Name: on 2nd line then 12345 on 3rd and XYZ on 4th. Even if I can get 12345 on the 2nd line it will work out for me. How can I fix this segmentation on v1 of documentai.

Please help me out on how do I fix the segmentation of the output.

@ErnestoC @kvandres

3 2 1,820

2 REPLIES 2

never-displayed