Hi,
I'm trying to set up Vertex AI Search (Not Chatbot) for our documentation site (as Sitesearch/PSE is being shutdown). Few questions
- How do I set certain words NOT to be stemmed? For eg: yugabyted is automatically converted to YugabyteDB. I don't want this to happen. (Specifying within quotes as "yugabyted" doesn't work)
- For Boost/Bury, how do I create a filter where the URL path matches a pattern
- How to deeplink to specific headers within the page?. How do I identify whether it is a hit on a header or not ?
Thanks in advance.
@ruthseki - any suggestions here. How do we get the right support here. We are an enterprise customer ..
Hi @premyb,
Welcome to Google Cloud Community!
Here are some possible approaches that you might need to help you address Vertex Search configurations:
Preventing Stemming for Specific Words - While Vertex AI search does not directly support custom stemming rules, you might consider the following approaches to handle stemming specific word :
Boost/Bury Filters Based on URL Patterns
Deeplinking to Specific Headers:
For more information about Vertex AI Search you can read through this documentation.
I hope the above information is helpful.
Thanks @MJane .
Preventing Stemming/Spell suggestion for Specific Words :
Preprocess your text data - How do I do this when using the Crawler?
I'm thinking of adding
spellCorrectionSpec":{"mode":"AUTO"}
for most queries and have a list of query exceptions and just for those cases set
spellCorrectionSpec":{"mode":"SUGGEST_ONLY"}
Boost/Bury Filters Based on URL Patterns :
I've the filter set as
siteSearch : "https://docs.yugabyte.com/preview/yedis/*"
and set the boost/bury score to -1 . but that does not seem to work correctly. For some queries the first result is from the same path. I don't get this.
Deeplinking to Specific Headers:
- Again, How can I preprocess data when using the Crawler? The data is just HTML (crawled by Vertex crawler ) & the headers are correctly defined with proper ID. Still, I'm unable to identify if it is a hit on the header. from the json search results response - What additional parameters do I have to pass in the request to get this info?
To prevent stemming in Vertex AI Search, try custom synonym rules, as there's no direct way to stop it. For Boost/Bury, use filter expressions like "url_path LIKE '/docs/%'" to match URL patterns. To deeplink specific headers, ensure headers have unique IDs in your HTML (e.g., <h2 id="section1">Header</h2>), and Vertex AI Search will index them as searchable entities, enabling links to specific sections in search results.
@shaikhsharmeen4 , The headers have unique ids and are correctly marked up and indexed .
<h2 id="section1">SomeText</h2>
, But for a search on "sometext" , I'm unable to identify if the hit was on the header, so that in the result listing I can modify the URL as url_path#section1 , so that the page will scroll to the header/anchor when clicked on the result. How do I do this ?
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |