I have a question which's crucial for time inference and efficiency in my company.
We're using Vector Search and filtering to retrieve the rigth samples that we need. But, for efficiency and inference speed, I would like to know how is the retriever working with filtering under the hood.
(1) Is it first doing the similarity search and then filtering the outputs by the metadata that we chose? Or (2) it's first doing the filtering and then searching by similarity just for that metadata?
What I mean with this is; in case 1) it wont do the search directly in all the chunks related to the metadata that I want (let's call it a user)
Sorry if this question should not be here! But I'm looking for answers in all the internet and I cannot find it
Hi @valenradovich,
Welcome to Google Cloud Community!
Vertex AI Vector Search likely uses a hybrid approach that combines filtering and similarity search to achieve optimal efficiency. It doesn't strictly follow either scenario (1) or (2) exclusively. The specific strategy is determined dynamically by the query optimizer, based on factors like dataset size, filter complexity, and indexing configuration.
The system's goal is to minimize the estimated cost of query execution. It often attempts to perform filtering before the computationally expensive similarity search to reduce the number of vectors requiring similarity comparison, significantly speeding up the process. However, it might not always be the lowest-cost option to fully filter before the similarity search, especially with complex queries or less selective filters. In such cases, it might perform filtering as a post-processing step on the initial similarity search results.
Based on common practices in high-performance vector database design and the features advertised for Vertex AI Vector Search, it is reasonable to assume that the system employs a hybrid strategy for combining filtering and similarity search. To optimize for both query speed and accuracy, the approach likely adapts dynamically based on factors such as dataset characteristics and query complexity. As a managed service, the specific implementation details of the query optimizer are not exposed to users, making it impossible to predict the precise execution plan for any given query.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hello @ibaui
What we want to make sure is that the similarity search is done only on the data that belongs to a user, for example. That user is part of the metadata.
I understand about the query optimizer, but beyond that, is there any way to make sure that the filtering is done first? In order to make sure that we only apply the similarity search with the possible necessary data that belongs to a user.
Why is this? So as not to leave tentative data out of the possible answer. What I mean is the following.
Example:
- 10 data from “user_1” with the ten entries related to soccer.
- 3 from “user_2” with one entry related to soccer and two related to tennis.
- The retrieve limit is 5 embeddings.
- query relates to soccer.
Problem: If the search query first applies the similarity search among all the embeddings, and then applies filtering, it will not return the 3 from “user_2”. Because in the 5 returned samples there will be, possibly, more chunks from “user_1”.
Am I right with the example? Or there will not exist this problem in which “is lost” information of a user (according to this example that we are with user in the metadata) in this way that I explained?
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |