Re: Inconsistent Training Results in Document AI S...

tootsieroll · 02-25-2025 03:11 PM

Hi Google Support,

I've been training a model in Document AI Splitter, but I'm noticing inconsistent results even when using the same training data. Specifically, when I train a model for the first time, I get good results with a high confidence threshold. However, when I retrain a new model with the exact same data (without making any updates to the model or dataset), the results vary significantly, sometimes with lower confidence scores.

I expected that retraining with the same data would yield similar or identical results, but that doesn’t seem to be the case. Could you provide insights into why this is happening? Are there factors influencing variability in model training that I should be aware of?

Any guidance on ensuring more consistent results would be greatly appreciated.

Thanks.

ibaui

Hi @tootsieroll,

Welcome to Google Cloud Community!

You're observing something that is relatively common in machine learning and model training, even when the data and parameters are kept constant. Here are some potential reasons for the variability you're seeing in your Document AI Splitter model results:

Random Initialization of Weights: Model initialization can play a significant role in training outcomes, even if the dataset and hyperparameters remain unchanged. Many machine learning models, including those used in Document AI, often initialize weights or certain model parameters randomly before training. This means that if your model is initialized differently during each training session, it can lead to variations in the final results, even if the dataset is identical.
Training Hyperparameters:Learning rate, batch size, or optimization algorithm can also affect the final results. If these hyperparameters aren't fully controlled or reset to the exact same values between training runs, they can lead to slightly different outcomes, even with the same data.
Small Dataset Size (Relevant to confidence scores):

Sensitive to Fluctuations: If your training dataset is relatively small, even minor variations in the training process can have a larger impact on the model's final performance and the confidence scores it produces. A small dataset might not fully represent the diversity of documents your model will encounter in production.
Confidence Scores and Thresholds:

Confidence Score Calibration: The confidence scores outputted by the model are estimates of the model's certainty, not absolute guarantees. The quality of the confidence score calibration can vary depending on the training process and the specific data.

Document Similarity: Even if your training data is 'exactly the same' in the sense of the file contents, if there is subtle differences in font rendering, skew, resolution or even minor OCR errors, the model can treat the documents slightly differently, leading to small variances in predicted bounding boxes and in the confidence scores.

Tune Confidence Threshold Carefully: Conduct a thorough analysis on the validation dataset using a metric that reflects your cost of errors, such as F1-score, to decide on the optimal confidence threshold.

You can also refer to the following documents for instructions on how to create, train, evaluate, deploy, and run predictions with models:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Inconsistent Training Results in Document AI Splitter