Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Document AI Training Errors - JSON and Schema not in agreement

Hello,

I'm a novice user in the Document.AI world and while attempting to train a processor I've encountered the "Training stopped due to errors" message. When I investigate this error I observe sections of the JSON similar to:

 

              "@type": "type.googleapis.com/google.rpc.ErrorInfo",
              "reason": "INVALID_DOCUMENT",
              "domain": "documentai.googleapis.com",
              "metadata": {
                "num_fields": "0",
                "annotation_name": "union",
                "num_fields_needed": "1",
                "field_name": "entities.text_anchor.text_segments",
                "document": "b2c7cb53fbb0bd58.json"
              }

 

The field union is set in the schema as "optional once" and so the metadata's report that 0 are found and 1 required seems off.

 

I understand there is a UI bug currently being investigated regarding these text_segments errors but it's unclear if I can work around this. I've been at a standstill for a week now, and shy of only identifying a single field per image it's not obvious to me what I'm doing on these particular records that is causing the error to appear.

Some of my identified fields overlap, which someone suggested could be the cause, however, the handwriting does overflow the typical fields and sometimes overlaps, so guidance would be appreciated.

 

Thanks!

0 1 1,032
1 REPLY 1

It appears there is a similar report here:

https://issuetracker.google.com/267366576

Reading that record provided by Google support, they say that the issue has been forwarded to Document AI Engineering for investigation.   I also note that the Train and evaluate processors feature of Document AI is flagged is Pre-GA.  This typically means that there may be issues.  If you are an enterprise customer, I'd suggest reaching out to your Google sales rep or your Google customer engineer.  Explain the situation to them.  If you have an NDA in place with Google, they will likely be able to share with you the roadmap for expected General Availability of this service.

You might consider using Google Issue Tracker and searching for similar issues and saying "me too" or else raise your own ticket.  The clearer your description and steps to reproduce the better (should you raise your own ticket).  If you have a full recreate story that can be posted which does not include sensitive information the better.