Hi,
I'm experimenting with Vertex AI Search for keyword search across structure documents and will eventually get to semantic search and LLM tuned search. However, just for keyword search, I am getting unusual results
I have a corpus of 10,000 structured documents that have the following fields: transcription (of a digitised manuscript page, usually in Latin), translation (of the transcription in English). I've ingested the corpus into a datastore via a JSON file in a Storage bucket. Only the translation field is 'searchable' and I am doing test searches against this corpus (both using the widget, but now with the API. I have not set the 'translation' field as any (required or optional) key property
If I search for a keyword, i.e "mendacious", the result *does not* include a specific document that includes this keyword in the 'translation' field and the search results includes documents that do not have the 'mendacious' keyword at all. I've turned on the relevance score and they are a mixture of 0.0 and 0.1 (in random order). Setting relevance threshold to HIGH returns no results. MEDIUM is what I am using.
Why doesn't Vertex return the document that has the specific keyword in the searchable field?
Thanks
/Rory
Hi @roryward1,
Welcome to Google Cloud Community!
It's possible the indexing hasn't fully updated since you ingested the documents. Indexing can take some time, especially for larger datasets. You've also mentioned that you've already adjusted the relevance threshold. However, note that the relevance score is based on a complex algorithm that considers many factors, including term frequency, document length, and the overall dataset. You can also check this documentation on how relevance score is computed in your specific application within Vertex AI.
You can also use the following features that can help you implement your desired ranking and prioritization results:
On the other hand, implementing filters in your search queries can help narrow down the results you are looking for.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.