Doesn't contain any ground-truth entity defined in...

sonicsw · 11-17-2024 11:14 PM

Dear Community,

while training a Google Document Classifyer, the training failed with the below errors. On inspection of the error, I noticed that all failed documents belong to classification lables which have been disabled due to insuficcent data at the time.

I am wondering why Google is tryingin to import documents which are match agained a disabled labled. Shouldnt they be skipped by default? Any advice how to solve and to go from here would be appriciated. Delete the document from the bucket (side issue, re-sync storage does not work, delete items in the bucket remain even after sync as the count of total document does not decrease)? Delete the lable for now?

"code": 3,
      "message": "Invalid document.",
      "details": [
        {
          "@type": "type.googleapis.com/google.rpc.ErrorInfo",
          "reason": "INVALID_DOCUMENT",
          "domain": "documentai.googleapis.com",
          "metadata": {
            "document": "gs://PATH TO DOCUMENT",
            "reason": "The document:  doesn't contain any ground-truth entity defined in the Schema."

MJane

Hi @sonicsw,

Welcome to Google Cloud Community!

It appears you're experiencing issues with Google Document AI attempting to process documents linked to disabled or inadequately populated classification labels.

Here are potential solutions that might help you resolve the issue :

Check Label Status - Ensure the labels are disabled in your Google Cloud Console. Toggling the labels off and on again can sometimes resolve inconsistencies.
Re-label or Remove - If documents are mislabeled or irrelevant, correct the labels or remove them from the training dataset.
Improve Data Quality - Make sure your training data is consistent in terms of format, language, and content.
Delete the label temporarily - If the disabled label is causing issues with training or data syncing, consider temporarily removing the label entirely. Once you've addressed the problems with your dataset, you can re-enable the label.

For more information about Custom Document Classifies, you can read this documentation.

If the issue persists, I suggest contacting Google Cloud Support as they can provide more insights to see if the behavior you've encountered is a known issue or specific to your project.

I hope the above information is helpful.

Doesn't contain any ground-truth entity defined in the Schema