Hi,
I am trying to find a way to identify vendors based on unique words/phrases on receipts found via ORCTEXT. Essentially, finding a store number, address, phone number, etc; anything that is unique to that vendor at that location. As ORCTEXT isn't necessarily accurate, I'm having troubles as there's receipts, for example, where a period is used instead of a coma in an address so INTERSECT doesn't return it as a common value.
I tried adding sets of 5 intersections of random receipts from that vendor, and then using intersect between those groups. I thought this would remove that problem as it lends likelihood to two reciepts with the same ORCTEXT error being found, and then both versions being added as key phrases.
The most straight forward is taking all phrases found on receipts from that vendor, and then subtracting all phrases found on receipts from other vendors. This actually lets AppSheet recognize the vendor fairly well, but it includes alot of things I don't want that intersect would remove (prices, dates, barcodes, etc). And it's likely there would be overlap at somepoint if leaving those so not overly robust.
I originally used single words rather than phrases, but addresses, store numbers, etc work better with phrases, though I could do a combination.
If you have any thoughts, I'd be glad to hear them!
@Luc87 wrote:
where a period is used instead of a coma in an address
Consider using SUBSTITUTE() to remove or change these
I'm not sure your ever going to get this working perfectly if you can't trust OCR to get the spelling right. Maybe create a system to filter out the obvious ones then have a seperate process for a human to go through the rest of the receipts?
Simon, 1minManager.com
Thanks for the input! Substitute is a function I'm less familiar with, so I'll have to experiment with that.
Trying a combination of COUNT and INTERSECT; idea being count the times the new receipt intersects with all the other receipts, and return the vendor of the highest count. Probably slow down the app too much regardless, but I don't know how to reference the row in the form? I INDEX COUNT, and get the last saved expense. Is there a way to reference the new not yet saved row?
I think at this stage it would help for you to give us some dummy data and let us know the result your after.
User | Count |
---|---|
16 | |
10 | |
9 | |
8 | |
3 |