Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

BigQuery VECTOR_SEARCH() and ML.GENERATE_EMBEDDING() - Negation handling

Hi,

I'm using BigQuery ML.GENERATE_EMBEDDING() and VECTOR_SEARCH() functions. I have a sample product catalog for which I created embeddings and then run vector search query to fetch the relevant results, which was working great until my query included the negation in it.

Say I write a query as , "looking for yellow t-shirts for boys."
It is working great and fetching the relevant results.

However, if change my query as, "looking for boys t-shirts and not yellow"
It should not include any results including the yellow color. Unfortunately, the color yellow is at the top of results, which means the negation ("not yellow") ain't working properly in this scenario.

What is the solution for it?

A sample query snapshot is shown below for reference

ammar_hanif_0-1719916592778.png

 



3 2 2,899
2 REPLIES 2

Hi @ammar_hanif,

Welcome to Google Cloud Community!

The issue you're facing is likely due to the limitations of how negation is handled within the embedding space generated by ML.GENERATE_EMBEDDING.

Here's why the negation might not work as expected:

Embedding Space: Embedding models learn to represent words and phrases in a multi-dimensional space where similar concepts are clustered together. Negation isn't explicitly represented in this space.

Distance Metrics: VECTOR_SEARCH uses distance metrics (like cosine similarity) to find the closest matches. When you say "not yellow," the system doesn't understand it as a specific concept to exclude; it just sees "yellow" as part of your overall query.

Currently, BigQuery embedding and vector search does not support the negation. You may refer to our embedding generation with BigQuery documentation and vector search documentation regarding its current features. 

In addition, as mentioned in the release notes dated January 31, 2024, search embeddings with vector search is in preview feature. Note that products in preview feature are often publicly announced, but are not necessarily feature-complete.

You can also file a Feature Request to let you customize the vector search query using your preference so that our Engineering Team can look into it. Note that there’s no definite date as to when this will be implemented. For future updates, I suggest keeping an eye out on the issue tracker and release notes.

I hope the above information is helpful.

Hi @ruthseki ,

Thanks for the detailed response on the subject.

I do have the similar understanding regarding the working of vector search in BigQuery and no support for the negation (unfortunately). A follow up question on this,

As per the scenario I was creating a intelligent chatbot using,
1. Gemini: To handle the natural language conversation, maintaining the context,  and function calling
2. BigQuery: To upload my product catalogue, create embeddings, and do query on a vector search.

Since the vector search at the moment cannot handle the negation. Can you suggest something to cater the negation?

Alternate Approach:
One possible solution I came up with was the use of gemini API in a way that,

sample prompt = "looking for boys t-shirts and not yellow"
- first analyze the prompt and remove the negation statement (say yellow in here)
- fetch all the results (say all the boys t-shirt)
- iterate each result's description and filter it through gemini with context of the prompt - in this gemini will understand the original user prompt and compare the products description to tag whether this products qualifies according to user's prompt or not.
- provide the filtered result from gemini to the user.

Concerns:

- The following approach can create additional latency to the overall solution
- Could incur far additional cost due to multiple invocations of gemini API on each products (worst if the original fetched result is in 1000s)