Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

AI to calculate content value score, 0 for low value test content and 1 for high value content

We have a large number of documents, by large, I mean document count is 10s of millions. Majority of these documents are ASPX pages. However, there are other formats too such as Microsoft Office docx, xlsx etc.  We need to delete documents that are really old and has low value. Given the volume of data, I was wondering if AI could help here. E.g. if AI could separate junk docs from actual real data. A junk doc is something that has 'testing testing' or 'Lorem ipsum'

May be an AI service to place a score on each document, 0 for very low value (has Lorem ipsum) and 1 for high value (has certain keywords such as customer names  etc). Could anyone please give any pointers about such a service?

1 2 163
2 REPLIES 2