Hi @mhsarkar,
Welcome to Google Cloud Community!
It's a common challenge to extract structured data from OCR outputs, especially when dealing with the variability of devices like blood pressure and glucose monitors. Let's break down the problems and explore solutions.
1. Improving OCR Accuracy for 7-Segment Digits:
- Custom Model Training: Train a custom model specifically for recognizing 7-segment digits. Using convolutional neural networks (CNNs) or specialized models for digit recognition, like LeNet-5, can improve accuracy. Additionally, use transfer learning to fine-tune a pre-trained model (e.g., one trained on MNIST) on the 7-segment dataset.
- Template Matching: Use template matching techniques to compare the segmented digits with pre-defined templates. OpenCV has template matching functions that might help accurately identify the digits.
- Data Augmentation: Enhance the dataset by generating synthetic 7-segment digit variations (different fonts, noise levels, sizes) to improve the model’s generalization.
- Preprocessing: Enhance the clarity of the 7-segment digits using image preprocessing techniques like contrast enhancement, noise reduction, and binarization.
2. Efficient Data Extraction from Google Cloud Vision API:
- Pattern Recognition & Regular Expressions: After obtaining the text from the Cloud Vision API, use regular expressions to extract specific values (e.g., systolic/diastolic blood pressure, glucose level, BPM). Identifying consistent patterns in the text output will allow you to isolate the relevant information.
- Keyword-Based Filtering: Look for keywords like "Systolic:", "Diastolic:", or "Glucose:" to guide your extraction logic. These keywords can anchor your extraction process to ensure you're retrieving the correct data.
3. Handling Memory Limitations for Training:
- Use Smaller Models: To work within the memory constraints of Google Colab’s T4 GPU, consider using smaller models or reducing the model’s complexity (e.g., pruning, quantization) to optimize memory usage.
- Training on Local Machine: If possible, run the model training on a local machine with a dedicated GPU, even if it is lower-powered, as it may provide more control over memory usage.
- Optimize Training Pipeline: Use data generators to load data in batches rather than loading the entire dataset at once to avoid memory overflow.
4. Alternative OCR Models:
- Explore More Specialized OCR Models: There are several open-source OCR models or techniques designed specifically for recognizing 7-segment digits. Look into model architectures that specialize in digit recognition tasks.
Additionally, you can refer to these documents for more information on preprocessing techniques to enhance OCR accuracy:
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.