Hi everyone,
I’m working on a project where I need to extract readings from Blood Pressure and Glucose Machines using Machine Learning. These devices typically display values using 7-segment digits, which makes OCR challenging.
What I’ve Tried So Far:
- Open-source OCR models (e.g., Hugging Face, Tesseract, EasyOCR) – but they struggle with 7-segment digits.
- Google Cloud Vision API – This gives much better accuracy, but the problem is:
Different devices show varying amounts of information (e.g., time, date, previous readings, current readings, etc.).
The API returns a long string, making it difficult to extract the specific readings I need.
Additional Challenge:
I also attempted to fine-tune an open-source AI model that accepts image data, but I couldn’t train it on Google Colab’s T4 GPU due to memory limitations.
Need Help With:
- How can I accurately extract the correct values (e.g., systolic, diastolic, BPM, glucose level) from the text output of Cloud Vision API?
- Are there any efficient open-source models or techniques that handle 7-segment OCR better?
- Any recommendations on training an AI model on a lower-memory environment?
I’d really appreciate any guidance or suggestions to overcome these issues. Thanks in advance!