Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Boolean field in Document AI custom extractor

I am trying to extract boolean fields. I used the `Checkbox` data type. But, I don't have any Checkbox in the document. I want it to understand the text and interpret it as true or false. Then, after training with a lot of labeled documents, it gets the exact position of the text that makes it true or false, but it gives me the text instead of true or false. What shall I do?

0 1 122
1 REPLY 1

Hi @markosmuche2018,

Welcome to Google Cloud Community!

It seems like you are trying to extract Boolean fields with Google Document AI's custom extractor, but your documents lack checkboxes. You want the extractor to interpret text as true or false and pinpoint the exact position, but it's returning the text itself instead of Boolean values.

Here are potential ways that might help with your use case:

  • Use the Text Data Type and Post-Processing Logic: You may configure the field in your custom extractor to use the Text data type instead of Checkbox. Identify and select the text area that signifies a true or false value. Implement logic in your application to map the extracted text to boolean values, defining which specific strings correspond to True or False.
  • Pre-processing Text: You may want to clean up any inconsistencies in your text data before you start labeling your documents. For example, ensure that 'yes', 'Yes', and 'YES' are all converted to a uniform string like 'yes'. Doing so might help the model learn faster and generalize better if you have many variations of a True or False string.

You may refer to the documentation below, which provides essential information for refining your method of extracting boolean fields from text using Document AI custom extractors:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.