Agent builder filtering mechanisms

motomodal · 10-13-2024 03:13 PM

I'm working with Vertex AI's Agent Builder to create a search engine over a product catalog for industry users, and I'm looking for advice on best practices for using filters and facets.

We're migrating from a traditional database to Agent Builder, and I'm running into an issue with filtering. Users might search using semantic queries, but sometimes they need precise filtering. The problem is that the filtering mechanism only allows `ANY()`, which requires exact matches. To achieve "contains" or "starts with" behavior, I currently filter using facets, extract the results, and then perform a second search with `ANY()`. This seems feasible but is limited by the facet filter's maximum of 300 results.

Is there a better way to implement "contains" or "starts with" filtering? How does Google recommend developers handle these scenarios when exact matches aren't sufficient, especially given the need for both semantic and precise filtering?

Any advice or insights would be much appreciated. Thanks!

ruthseki

Hi @motomodal,

Welcome to Google Cloud Community!

The shift from traditional databases to semantic search engines presents challenges in filtering and facet limitations. While Agent Builder's ANY() function enables exact matches, it lacks flexibility for "contains" or "starts with" filtering. Moreover, the 300-result facet limit can restrict search results.

Here are some approaches you may use to leverage the strengths of both semantic and traditional filtering:

1. Enhanced Semantic Search:

Document Embeddings: Utilizing pre-trained language models like BERT, embed search queries and product descriptions into a vector space, enabling semantic matching beyond exact words.
Semantic Similarity: Instead of exact matches, calculate cosine similarity between query and product vectors to capture the underlying meaning of the query and identify relevant products.

2. Augmented Filtering:

Pre-Indexing for "Contains" and "Starts With": Store pre-computed values for "contains" and "starts with" for key fields like product names and descriptions during indexing. This allows for efficient filtering on those criteria.
Broadly Scoped Facets: Use broad categories or attribute ranges for initial filtering, reserving finer-grained options for further refinement.
"Search within Results" Approach: Utilize facets to narrow down to a manageable set, then use ANY() or pre-indexed "contains" and "starts with" data to further refine results.

3. User Interface Design:

Combined Search & Facets: Provide a semantic search box alongside interactive facets to allow for both flexible queries and refinement.
Transparency and Explanations: Explain how search queries are interpreted, displaying the results of semantic matching and facet filtering for improved user understanding.

Additionally, to enhance search accuracy, you may try to implement strong SEO practices for your product descriptions to improve the accuracy of semantic matching. Clear and well-structured descriptions will help the search engine understand the meaning of your products.

Moreover, to continuously improve, track user behavior and gather feedback on their search experience. Use this data to fine-tune your search ranking algorithms and optimize the user interface for improved usability and relevance.

I hope the above information is helpful.

motomodal

Hello @ruthseki

Thank you very much for your detailed response—I truly appreciate your assistance.

I am hoping to be able to utilize the default model provided by Google for the Agent Builder search and am currently exploring some of the solutions you suggested to integrate into my approach. I am using agent builder with a structured datastore.

I have a few additional questions:

Request Size Limits
- Is there a limit to the size of the request that can be sent to the Discovery Engine API?
ANY() Statement Limits
- Is there a maximum number of terms that can be included within an ANY() statement when filtering? I did see a maximum specified here filter recommendations however I did not see one here filter search
- As a potential workaround, I could pre-index the datastore and return a list of relevant document IDs as a filter using ANY("id1", "id2", ...). However, I am curious about any limits or quotas associated with this approach. Could you provide guidance on this?
Schema Field Interpretation
- Within my schema, I have descriptive field names such as title and other_part_numbers_that_are_the_same. Does the default model take these field names into account when associating and interpreting the data?
User Feedback Integration and Model Training
- When using the Agent Builder, is there a feature that allows for integrating user feedback to continuously train the model? It seems that the widget supports this, but I do not see a corresponding API.
- Can I utilize the Ranking API with the Google-provided model for this purpose, or would I need to train my own language model (LLM)?

Any additional insights or guidelines you could provide on these topics would be greatly appreciated.

Thank you once again for your support!