Hi everyone,
I'm facing an issue with my processor. It's a custom extractor, but the JSON output contains fields typically associated with a specialized extractor, such as normalizedValue, as well as fields that are not defined in my schema (e.g., supplier_phone, supplier_name, etc.).
However, when I upload the same file to the Document AI UI, it only extracts the labels defined in my schema, as expected. This problem only occurs when using the API request.
I'm currently using batch processing. Is there any specific configuration I need to adjust to resolve this issue?
Thanks in advance for any help!
Solved! Go to Solution.
Hi @andressasoares,
Welcome to Google Cloud Community!
The issue you're facing is due to how the Document AI system applies schemas differently in the UI and the API's batch mode. The UI usually applies the schema more strictly, while the API in batch mode might be more flexible unless you tell it not to be.
To fix this, you need to control the output fields in the API request settings. The main thing is to use the fieldMask parameter in documentOutputConfig, which lets you specify exactly which fields should be included in the response.
Here's how to solve the problem:
Additionally, by clearly setting the fieldMask, you ensure that the API only returns the fields you've selected in your schema, just like in the Document AI UI. This stops the API from returning extra, unwanted fields. If you're still having issues, make sure the schema used in your API call exactly matches the one in your Document AI project.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi @andressasoares,
Welcome to Google Cloud Community!
The issue you're facing is due to how the Document AI system applies schemas differently in the UI and the API's batch mode. The UI usually applies the schema more strictly, while the API in batch mode might be more flexible unless you tell it not to be.
To fix this, you need to control the output fields in the API request settings. The main thing is to use the fieldMask parameter in documentOutputConfig, which lets you specify exactly which fields should be included in the response.
Here's how to solve the problem:
Additionally, by clearly setting the fieldMask, you ensure that the API only returns the fields you've selected in your schema, just like in the Document AI UI. This stops the API from returning extra, unwanted fields. If you're still having issues, make sure the schema used in your API call exactly matches the one in your Document AI project.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |