Document AI Labeling is Not Working (and there are...

cloudy115 · 06-23-2023 10:09 AM

My primary problem is that when I attempt to do "manual" labelling of my Document AI documents, they are displayed in an incredibly blurry resolution that makes it impossible to label them. These same documents are actually high resolution and have all already had Document AI OCR conducted on them successfully separately.

It makes no difference if I attempt to zoom. I have a paranoid suspicion that Google is doing this on purpose to try to force us to pay them for their labelling service. 😬

I have tried refreshing, Chrome, Firefox, etc. Again, the documents themselves are good quality, plain image formats, etc. There are numerous other bugs with Document AI, several of which seem gearing towards forcing the user to incur higher unwanted billing:

- Only the full bucket could be imported (850 images) not just the desired subset (50 images)
- Once import started it could not be stopped
- Manual and automatic labelling cannot be conducted but paid labelling is readily available
- My task is to remove personal information from documents (de-identify), which is an incredibly common use case, yet it seems I have to re-invent the wheel here and no such pre-existing processing task can be found
- I cannot search the files by filename to delete large subsets within the UI
- There's no deduplication of identical images (which accounts for 50% of my images)

I suppose it may just be a mountain of glitches and technical debt from the Google team, not all all-out conspiracy to incur higher bills, but they certainly choose to solve broken money-saving features last.

======

UPDATE

It seems the unwarranted blurriness is specific to the Train tab of the primary Document AI interface. Strange that they even have the labelling tool there given that it is thoroughly unusable. However, when I created the labelling task and assigned it to a pool consisting of only myself, it eventually emailed me a link to another LabelBox-like labeling view after some confusing configuration steps through the Manager interface. So finally, I can label.

I think the aforementioned problems, bugs / glitches and missing features are still valid though, and I would add to them:

- Sluggish updates within the Document AI interface to results from labelling tasks
- Inability to balance testing and training samples within an assigned task

Document AI Labeling is Not Working (and there are numerous other glitches)