Document Ai Custom Extractor performs worse when f... - Page 2

hzham · 12-02-2024 11:17 PM

Hi all,
I am encountering an unexpected issue where my non-fine-tuned custom extractors are achieving significantly better F1 Scores compared to our fine-tuned extractors after upgrading to the latest OCR version. Additionally, the AI model fine-tuned with pretrained-foundation-model-v1.1-2024-03-12 is also outperforming the new models trained with pretrained-foundation-model-v1.2-2024-05-10 or pretrained-foundation-model-v1.3-2024-08-31. I have used the same datasets for both the old and the new processors but the results are quite different. Especially, v1.2 is performing 12 times worse when it is fine-tuned. Is there anyone experienced similar results with the new OCR version?

Context: I am extracting data from student transcripts, and most of the time I am using 30 samples to fine-tune
Old Version Processor:
Fine-Tuned Extractor (v1.1): F1 Score = 0.916

New Version Processor:
Versions:
Fine-tuned1 (base model : pretrained-foundation-model-v1.2-2024-05-10): F1 Score = 0.086
Fine-tuned2 (base model: pretrained-foundation-model-v1.3-2024-08-31): F1 Score = 0.945
pretrained-foundation-model-v1.2-2024-05-10 : F1 Score = 0.986
pretrained-foundation-model-v1.3-2024-08-31: F1 Score = 0.964

Document Ai Custom Extractor performs worse when fine-tuned in the new OCR version