Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Document Ai Custom Extractor performs worse when fine-tuned in the new OCR version

Hi all,
I am encountering an unexpected issue where my non-fine-tuned custom extractors are achieving significantly better F1 Scores compared to our fine-tuned extractors after upgrading to the latest OCR version. Additionally, the AI model fine-tuned with pretrained-foundation-model-v1.1-2024-03-12 is also outperforming the new models trained with pretrained-foundation-model-v1.2-2024-05-10 or pretrained-foundation-model-v1.3-2024-08-31. I have used the same datasets for both the old and the new processors but the results are quite different. Especially, v1.2 is performing 12 times worse when it is fine-tuned. Is there anyone experienced similar results with the new OCR version?

Context: I am extracting data from student transcripts, and most of the time I am using 30 samples to fine-tune
Old Version Processor: 
Fine-Tuned Extractor (v1.1): F1 Score = 0.916

New Version Processor:
Versions:
Fine-tuned1 (base model : pretrained-foundation-model-v1.2-2024-05-10): F1 Score = 0.086
Fine-tuned2 (base model: pretrained-foundation-model-v1.3-2024-08-31): F1 Score = 0.945
pretrained-foundation-model-v1.2-2024-05-10 : F1 Score = 0.986
pretrained-foundation-model-v1.3-2024-08-31: F1 Score = 0.964
0 3 488
3 REPLIES 3

Hi @hzham,

Welcome to Google Cloud Community!

You're facing a common issue with the new Document AI OCR versions (v1.2 and v1.3). Your fine-tuned extractors are performing worse than non-fine-tuned ones, and older models (v1.1) are outperforming the new ones. Here's a breakdown of potential reasons for this discrepancy:

  • Labeling Mismatch: The new OCR versions might have different labeling systems, causing issues with your old labels.
  • Model Architecture: The new models might have different architectures or training data, leading to different extraction behaviors.
  • Data Quality: Subtle differences in your training data can affect fine-tuning.
  • Data Size: 30 samples might be insufficient for fine-tuning, especially with the new models.

Here are the troubleshooting steps you can take:

  1. Relabel: Create a new processor with the latest OCR and relabel your data.
  2. Data Review: Check your datasets for inconsistencies or errors
  3. Increase Data: Expand your training dataset if possible.
  4. Fine-Tuning Parameters: Experiment with training steps, learning rate, etc.

Additionally, you can refer to this document to learn how to fine-tune models for more accurate data extraction from your documents.

I hope the above information is helpful.

Hi,
First of all, thank you for the reply. For your suggestions:
Relabel: I have already done that this is the result from the relabeled data set with the latest OCR.
Data Review: I have used the same data-set with almost the same labelling -of course there might be slight differences between borders of the labels but I do not reckon that would produce such a major impact-
Increase Data: If new models do no require more training data to perform with the same levels as the old model this cannot be the cause of the problem. If there is such a case, yes this might work.
Fine-tuning parameters: I have done some experiments and increasing the training steps worked like a charm and now its f1 score is much higher than the old one. Is there any documentation or a reference to how to find best parameters combinations in Document AI?

Best,
Hamit

Just for your information, in my case, the fine-tuned models are better than the pre-trained models for both v1.2 and v1.3. It improved from 0.75 and .78 to .83 after fine-tuning. I have more than 200 labeled documents.